Second Year at OOP Conference
It was our second year participating in the OOP conference in
I was a bit surprised that there were a good number of people that said they have problems dealing with large XML files. We call this VLM (Very Large Messaging) and we are very familiar with the issues surrounding it. The most common issue has to do with memory consumption. Typical DOM based parsers use about 6 times the size of the original XML file in memory. When you start dealing with large files, that quickly becomes a problem. The other main issue people are having is rooted in performance. Parsing large files is time consuming, especially when unexpected things like garbage collection kick in. We have been solving this problem for a while now with our HydraSDO for XML product. It’s very fast being written in C++, provides a native Java API as well that utilizes shared memory so as not to impact performance, and uses a non-extractive, indexed parsing model that gives a 1.5 times original XML document size memory footprint.
I’m excited for the opportunity to help these people with the large XML file problems they’re having. If you’re having similar issues, let me know and we can get you some quick help.
Del.icio.us | Technorati | Digg | Slashdot

April 26th, 2008 at 4:02 am
Interesting post on the large XML file processing. I have been looking at the Intel XML Software Suite solution and they claim to support up to 20GB files. How does the Roguewave solution compare?
see http://www.intel.com/software/xml
June 16th, 2008 at 2:24 pm
Hi Matt, sorry for the VERY late reply. I didn’t realize that you had left a comment. We haven’t tested up to 20 GB, but have tested a 4 GB file. HydraSDO scaled linearly with both memory footprint as well as performance up to 4 GB.
Most people we speak with use files under 1 GB in size, and in fact there was only the one situation where someone had a 4 GB file. Are you seeing usage of XML with sizes greater than 1 GB?