Saturday, April 4, 2009

Implementation of SAX Parser in Java using SAX2 APIs


For those who have reached to this article directly, before we move on to discussing the implmenetation of a sample SAX-based XML parser in Java, they may like to refresh their understanding of SAX by referring to this article - Evolution of Java and XML combo. SAX, DOM, JAXP, JDOM >>

Implementation of a SAX2-based XML parser in Java

We will start with looking at the various steps involved in writing a SAX-based XML Parser in Java and subsequently we'll see the code-listing and the output. The implementation can be broken down into the following 5-6 steps:-

(1) Inheriting DefaultHandler:
If you're using SAX2 then you can inherit from the class DefaultHandler, which is the base class for SAX2 event handlers. This provides default implementation for all the callbacks of all the four core SAX2 handler interfaces: EntityResolver, DTDHandler, ContentHandler, and ErrorHandler. We normally need to override only the methods of the ContentHandler interface in most of the cases. In case you are using SAX1, you would use HandlerBase class in place of DefaultHandler. The signature of some of the methods of SAX1 may differ from the same of SAX2 and hence you would require to make the necessary changes in your method-override definition.

public class SAXXMLParserImpl extends DefaultHandler{

(2) New instance of SAXParserFactory: SAX Parsers are obtained from a factory class named 'SAXParserFactory' and hence one must need to get an instance of the factory first.
//Getting a new instance of the SAX Parser 
FactorySAXParserFactory factory = SAXParserFactory.newInstance();

(3) New instance of SAX Parser: once you have got a factory instance then you can simply use the API to get a new instance of the SAX Parser.
//Getting a parser from the factory
SAXParser saxParser = factory.newSAXParser();

(4) Parsing the XML document: now that you have a SAX Parser instance, you just need to pass the XML document and a DefaultHandler instance for parsing the XML document.
//Parsing the XML document using the parser
saxParser.parse( new File(XML_FILE_TO_BE_PARSED), new SAXXMLParserImpl() );

(5) Implementing the required handlers: inheriting from the DefaultHandler class would provide you the default implementation of all the SAX2 APIs, but the default implenmentation (at least in some cases) can be as good as nothing. You would need to override at least some of the methods to make the processing of XML documents possible.

(6) Cosmetic/Admin Stuff: you may like to have private members to keep references to the XML File path and output stream. The members would obviously be required to be set correctly before they are used. Additionally, you may like to define few simple helper methods for performing routine tasks to make the code more readable, maintainable, and scalable.

Source Code of the Implementation


SAXXMLParserImpl.java


import java.io.*;

import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;

import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.SAXParser;

public class SAXXMLParserImpl extends DefaultHandler{

//Path of the XML File to be parsed - private as we don't want it outside
//and 'final' as once it's assigned to a value (path), it doesn't require any change
private static final String XML_FILE_TO_BE_PARSED = "C:\\LanguageList.xml";

//Reference to the output stream
private static Writer out;

public static void main (String argv [])
{
//Getting a new instance of the SAX Parser Factory
SAXParserFactory factory = SAXParserFactory.newInstance();

try {

//Setting up the output stream - in this case System.out with UTF8 encoding
out = new OutputStreamWriter(System.out, "UTF8");

//Getting a parser from the factory
SAXParser saxParser = factory.newSAXParser();

//Parsing the XML document using the parser
saxParser.parse( new File(XML_FILE_TO_BE_PARSED), new SAXXMLParserImpl() );

} catch (Throwable throwable) { //Throwable as it can be either Error or Exception
throwable.printStackTrace ();
}
System.exit (0);
}

//Implementation of the required methods of the ContentHandler interface

public void startDocument()throws SAXException
{
printData("XML File being parsed: " + XML_FILE_TO_BE_PARSED);
printNewLine();printNewLine();
printData("INFO: ### Parsing of the XML Doc started ###");
printNewLine();printNewLine();

printData ("");
printNewLine();
}

public void endDocument()throws SAXException
{
try {
printNewLine();
printNewLine();
printData("INFO: ### Parsing of the XML Doc completed ###");

out.flush ();
} catch(IOException ioe) {
throw new SAXException ("ERROR: I/O Eexception thrown while parsing XML", ioe);
}
}

public void startElement(String namespaceURI, String localName, String qName, Attributes atts)throws SAXException
{

printData ("<" + qName);

if (atts != null) {
for (int i = 0; i < atts.getLength (); i++) {
printData (" ");
printData (atts.getQName(i) + "=\"" + atts.getValue(i) + "\"");
}
}

printData (">");
}

public void endElement(String namespaceURI, String localName, String qName)throws SAXException
{
printData ("");
}

public void characters(char buffer [], int offset, int length)throws SAXException
{
String string = new String(buffer, offset, length);
printData(string);
}

//Definition of helper methods

//printData: accepts a String and prints it on the assigned output stream
private void printData(String string)throws SAXException
{
try {

out.write(string);
out.flush();

} catch (IOException ioe) {
throw new SAXException ("ERROR: I/O Exception thrown while printing the data", ioe);
}
}

//printNewLine: prints a new line on the underlying platform
//end of line character may vary from one platform to another
private void printNewLine()throws SAXException
{
//Getting the line separator of the underlying platform
String endOfLine = System.getProperty("line.separator");

try {

out.write (endOfLine);

} catch (IOException ioe) {
throw new SAXException ("ERROR: I/O Exception thrown while printing a new line", ioe);
}
}

}

LanguageList.xml
<?xml version="1.0" encoding="UTF-8"?>
<LanguageList>
<Language id = "1">
<Name>Java</Name>
<Description>Arguably the most wodely used language for Application Dev</Description>
</Language>
<Language id = "2">
<Name>C</Name>
<Description>Arguably the most widely used language for System Soft Dev</Description>
</Language>
</LanguageList>

Output
XML File being parsed: C:\LanguageList.xml

INFO: ### Parsing of the XML Doc started ###

<?xml version='1.0' encoding='UTF-8'?>
<LanguageList>
<Language id="1">
<Name>Java</Name>
<Description>Arguably the most wodely used language for Application Dev</Description>
</Language>
<Language id="2">
<Name>C</Name>
<Description>Arguably the most widely used language for System Soft Dev</Description>
</Language>
</LanguageList>

INFO: ### Parsing of the XML Doc completed ###

Liked the article? Subscribe to this blog for regular updates. Wanna follow it to tell the world that you enjoy GeekExplains? Please find the 'Followers' widget in the rightmost sidebar.



Share/Save/Bookmark


No comments: