Once again: no excuses to ignore i18n in XML

   Print.Print
Email.Email weblog link
Blog this.Blog this
Uche Ogbuji

Uche Ogbuji
Aug. 17, 2004 09:32 PM
Permalink

Atom feed for this author. RSS 1.0 feed for this author. RSS 2.0 feed for this author.

URL: http://www.javareport.com/article.asp?id=9797...

I think the most pervasive problem in XML adoption is ingorance and even wilful sabotage of the international foundation on which XML is built. In several recent incidents, both in my consulting work and in my OSS/community work I have come across systems that ignore or break XML's Unicode character model.

I've almost grown tired of saying it, but it is worth saying until I've worked through my very last nerve: the single most important aspect of XML is its character model. Ditch XML and use something else before you mess with that. A tremendous amount of damage is done by people who can't see past the pointy brackets as the point of XML.

Yes, Unicode is hard. There is nothing to be done about this. We have a myriad of languages, writing systems and local conventions, and they complicate just about everything. That's our wacky, wondrous world for you. Nevertheless, as a software professional in this age, there is no excuse not to buckle down and learn the rigors of i18n. I'm not meaning to be a pedant about this: I know a lot less abotu i18n than I wish I did, and I fall short of good i18n in much of my code. However, I respect the problem and I strive to work on my skills in the area, and my discipline in applying it in software development.

If you use XML in your work, please read "The skew.org XML Tutorial. A reintroduction to XML with an emphasis on character encoding", by Mike Brown (a truly brilliant article). You might also want to check out my article "Proper XML Output in Python". Even if you're not a Python programmer, you might find some use in its discussion of common character problems when generating XML.

Uche Ogbuji is a Partner at Zepheira, LLC, a solutions firm specializing in the next generation of Web technologies.

Return to weblogs.oreilly.com.



Weblog authors are solely responsible for the content and accuracy of their weblogs, including opinions they express, and O'Reilly Media, Inc., disclaims any and all liabililty for that content, its accuracy, and opinions it may contain.

Creative Commons License This work is licensed under a Creative Commons License.