I've been doing more and more work with XML, and my appreciation for that family of technologies is growing by the day. XML and open data standards solved a problem that arose with OpenOffice.org Writer a few weeks ago.
OpenOffice.org is (of course) an open source office suite. I've been using a pre-2.0 test version of the software, and it has demonstrated a few instabilities but also has some great new features.
I spent all of Tuesday evening using OOo to draft a detailed outline for a book. The outline contained nearly 200 entries, and while the resulting document was fairly short -- only a few pages long -- it represented a lot of work.
You can imagine my dismay when the document wouldn't open the next morning. "Read Error," the program whined. Something incomprehensible about a format error at (2,2847) in styles.xml.
Not a problem, I thought -- I'll just use the backup that I'd saved. Same error! The previous version of the file - same error!
If I'd written that document in WordPerfect or MS Word, that would have been the end of the story. I'd probably have to rewrite. I know it happens; I've been there.
But OOo 2.0 uses a document format that is an OASIS standard, which means that it's publicly documented XML. Actually, an OOo document is a zip archive containing multiple XML files.
So I unzipped the OOo archive and checked the styles.xml file using xmlwf (checking to see if the XML was 'well-formed', which is step one of two on the road to correctness; the second hurdle is validity according to the schema). Sure enough, there was a duplicate element attribute at the line and column indicated in the cryptic OOo error message.
Edit it out, zip it back up, try again, and ... same error, different location. But after a couple of iterations the problem was fixed.
Sure, it was a pain, and sure, it should never have happened. But in an imperfect world, I'd much rather have my data in an accessible format that can be manipulated by many different tools than locked up in an undocumented, proprietary format.
Chris Tyler is a programmer and Linux network administrator with a focus on the X Window System and LAMP. He has programmed in two dozen different languages over the past 20 years, and now teaches at Seneca College, Toronto.
oreillynet.com Copyright © 2006 O'Reilly Media, Inc.