Wednesday, April 1, 2009

Sax, DOM, JAXP, & JDOM. Evolution of Java-XML combo.

Evolution of the XML Parsing/Manipulation using Java

The combination of Java and XML has been one of the most attracting things which had happened in the field of software development in the 21st century. It has been mainly for two reasons - Java, arguably the most widely used programming language and XML, almost unarguably the best mechanism of data description and transfer.

Since these two were different technologies and hence it initially required a developer to have a sound understanding of both of these before he can make the best use of the combination. Since then there have been a paradigm shift towards Java and we have seen few interesting technologies getting evolved to make this happen. Some of them are:-

SAX - Simple API for XML Parsing

It was the first to come on the scene and interestingly it was developed in the XML-Dev maling list. Evidently the people who developed this were XML gurus and it is quite visible in the usage of this API. You got to have a fair understanding of XML, but at least Java developers got something to combine the two worlds - Java and XML in a structured way. It instantly became a hit for the obvious reasons.

Being the first in the evolution ladder, it obviously had only the basic support for XML processing. It is an event-based technology, which uses callbacks to load the parts of the XML document in a sequential way. This effectively means you can't go back to some part which was read/processed previously - if you do have such a requirement then you would need to store/manage the relevant data yourself.

Since this API does require to load the entire XML doc and also because it offers only a sequential processing of the doc hence it is quite fast. Another reason of it being faster is that it does not allow modification of the underlying XML data.

Interested in going through a step-by-step implementation (with explanation of the complete source code) of a simple SAX Parser in Java using SAX2 APIs? Here is it for you - SAX Parser Implementation in Java >>

DOM - Document Object Model

The Java binding for DOM provided a tree-based representation of the XML documents - allowing random access and modification of the underlying XML data. Not very difficult to deduce that it would be slower as compared to SAX.

The event-based callback methodology was replaced by an object-oriented in-memory representation of the XML documents. Though, it differs from one implementation to another if the entire document or a part of it would be kept in the memory at a particular instant, but the Java developers are kept out of all the hassle and they get the entire tree readily available whenever they wish.

JAXP - Java API for XML Parsing

The creators and designers of Java realized that the Java developers should not be XML gurus to use the XML in Java applications. The first step towards making this possible was the evolution of JAXP, which made it easier to obtain either a DOM Document or a SAX-compliant parser via a factory class. This reduced the dependence of Java developers over the numerous vendors supplying the parsers of either type. Additionally, JAXP made sure that an interchange between the parsers required minimal code changes.

JDOM - Java Document Object Model

Even though JAXP reduced the need for caring about the different parser implemenattions, still it required the developers to use either the DOM or SAX for manipulating the XML data. JDOM evolved as the designers of Java APIs thought of moving more towards Java and Java-like constructs while processing XML documents and it supported moving away from non-Java structs like Attributes (in SAX) and NamedNodeMap (in DOM). Now the Java developers can use the mucm more familiar Java Collection classes to manipulate XML data. Moving towards the customary Java constructs also helped making the processing faster - almost at par with SAX.

So, now that we are aware of what SAX and DOM are, let's move towards discussing the differences between the two. As is the case with most of the other technological comparisons, neither of the two is an absolute favourite and the choice would more often than not depend upon your requirement. SAX v/s DOM. When to use what?

Liked the article? Subscribe to this blog for regular updates. Wanna follow it to tell the world that you enjoy GeekExplains? Please find the 'Followers' widget in the rightmost sidebar.



Anonymous said...

Hi Geek,
Could you please throw more light on what you mean by Event handling in SAX? Maybe with a small example, it would be much easier. Thanks.

Geek said...

We already have an article having the Java implementation (with a brief explanation of the source code) of a simple SAX Parser using SAX2 APIs. Let me know if you were looking for something else.

Find the post here. I've now updated this article by adding a link to that one for better visibility.

You may think of giving the Search (located at the blog header and beneath the article body) a try next time you're looking for something. It'll save your time in waiting for my response :-). Keep visiting/posting!

Anonymous said...

Hi Geek,
Could you come up with an article on JAXB as well, along with a comparison with the other methods?