C++ XML Parser rant 2005-03-14


I've been toying with Atom over the weekend. Or should I say I've been frustrating over the quality (or lack of) C++ XML parsers.

We have the C++ version of Xerces, but I've never liked it. Probably because it's refusal to use namespaces and exceptions, and the cumbersome (and slow!) string handling as a result of the way it handled i18n makes it painful to work with, and the new API completely reverts to an old C++ style full of pointers instead of the RAII idiom or similar more modern techniques such as the Boost smart pointers.

libxml is very featureful, but plain C, and the C++ wrappers available leave much to be desired. It also doesn't provide a proper DOM interface, but an approximation that I find painful to use (because in many places diverges from the W3 DOM for no good reason other than legacy). Documentation is also an issue - I've more than once resorted to the header files while looking for something that should have been trivial.

For SAX work Expat is a tolerable alternative, and there are some C++ wrappers out there. But
it doesn't provide a DOM.

One possible solution to that might be Arabica which looks very promising as a DOM layer on top of Expat or libxml, but so far I haven't gotten it to compile at all, and haven't had much time to spend on it...

WHY is this so hard? I feel tempted to write a parser myself, or at least a DOM implementation - I've written parts of one before and also spent quite some time reading the Xerces source. I don't ask for much - level 1 Core and some minor parts of level 2 is all I ask for...


blog comments powered by Disqus