19
Fri, Apr
5 New Articles

XML Parsing with SAX

Web Languages
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

XML has become a standard part of the IT landscape in a very short time. It provides a platform- and vendor-neutral approach to describing data, thus easing the burden of exchanging data between disparate systems. As a developer, I had to learn how to work with this new data format.

XML arrives as normal text. The burden is on the developer to properly interpret it. This interpretation involves the parsing of the XML. Currently, there are two standard parsing methods: DOM and SAX.

DOM (Document Object Model) treats an XML document as a whole. The XML is stored in memory. Once in memory, the data can be worked with as needed. One drawback to this approach, however, is the amount of memory consumed by the XML. Also, it is overkill if you are only concerned with one piece of data in the XML document. This is where SAX enters the picture.

SAX (Simple API for XML) processes an XML document as it is read. Every element is treated as an event. The XML document is not stored in memory; it is read and discarded. It’s up to the developer to store any pertinent data. Thus, code is written to handle certain elements, data, or attributes of an XML document. It is handled accordingly once the specified element is encountered.

We’ll utilize SAX 1.0 in this article. A newer version has been released (2.0), but
1.0 is widely used and is supported in the newest version. We’ll use the freely available Xerces Java parser in this article. It is available from the Apache Software Foundation (xml.apache.org).

SAX Classes and Methods

If you’ve worked with the DOM, you’ll be pleasantly surprised by how easy it is to use SAX. The SAX classes are contained in the org.xml.sax package. The base class in this package is called HandlerBase. The HandlerBase class contains all methods for working with XML data. Methods exist to deal with character data, elements, errors, and so forth.

In order to utilize SAX in your code, your XML parsing classes must use the HandlerBase class as the base or parent class. That is, you must extend it like so: Once you have used HandlerBase as your base class, all of its methods for handling XML are available in your class.

Figure 1 shows a list of methods of the HandlerBase class; the list is not exhaustive, but it gives an idea of how SAX works. All or none of the methods can be utilized in your code. All of the methods return no value (void) and are public, and they throw SAX exceptions (SAXException class).

SAX in Action

The easiest way to get an idea of how the class and its methods are used is in an example. The XML shown in Figure 2 will be used in our first example; it contains a list of books. Each book entry contains a title and ISBN. The XML parsing class shown in Figure 3 utilizes SAX and print statements to handle events (XML entities) with the following 15 steps:

1. The appropriate SAX classes are made available in the code.

2. The class is declared.

3. String object is used to store text data.

4. String object is used to store entity name.

5. The startDocument method is fired at the beginning of an XML document.

6. The endDocument method is fired at the end of an XML document.

7. The startElement method is fired at the opening tag of an element. The element name is one of its parameters, and it is used to populate the String object.

8. The String object is set to the name of the current element.

9. The endElement method is fired at the ending tag of an element.

10. The String object is cleared at the end of the element.

11. The characters method is fired when text is encountered inside an element. A character array, its starting point in the array, and its length are passed as parameters in the method.

12. A String object is populated with the current text.

public class MyClassName extends HandlerBase

13. The text is displayed if, and only if, it is blank. The trim and equalsIgnoreCase methods of the String class are utilized.

14. The error method is fired when parsing errors are encountered.

15. The fatalError method is fired when fatal errors are encountered.

Now, we have our SAX class. We need to utilize it in an application that reads an XML document. Figure 4 (page 58) shows the Java code for our saxExample class. The following 10 steps explain the Java code; Figure 5 (page 58) shows the resulting output:

1. The SAXException class is available in the code.

2. The SAXParser class from Xerces is available in the code.

3. The class is declared.

4. A local file is used as the XML source.

5. The main method is the entry point of the Java application.

6. Declare an instance of the SAXParser class.

7. Declare an instance of the SAX class.

8. Call the setDocumentHandler method of the SAXParser class. This assigns the class as the document handler.

9. The parse method of the SAXParser class is used to parse our file.

10. Errors are handled.

Setting It All Up

I want to discuss setup before diving into the next example. The Java classes that comprise the Xerces XML parser (available for free download at xml.apache.org) must be available in your Java development environment. I am using the command-line Java Development Kit (JDK) from Sun Microsystems, so the system classpath variable is used to locate needed Java files. The classpath variable is used when locating class files referenced in import statements or the base Java class files. I am using version 1.1.8 of the Sun JDK and Xerces 1.01. Xerces is installed in the xerces directory on my D drive, and the JDK is installed in the jdk1.1.8 directory on my C drive. Therefore, my classpath for these examples is:

classpath = c:jdk1.1.8libclasses.zip;d:xercesxerces.jar

The classes.zip file contains all base Java classes. The setup for your development environment may be different, so please consult your documentation.

Taking It a Step Farther

Although the first example may seem simple, it introduces the major aspects of parsing XML via SAX. After all, the S in SAX stands for simple. The example shown in Figure 6 uses the same basic XML, but it is received via an HTTP request. The file is parsed and displayed as an HTML table. The saxExample class from Figure 2 is modified by adding statements to output HTML. The beginning and end of the XML document generates the beginning and ending HTML table tags. A book comprises one line in the table with individual values in separate cells.

Figure 7 shows how the code from Figure 4 can be changed slightly to work with a URL instead of a local file to produce the following output:


Domino Development With Java 1930110049
Practical LotusScript1884777767

Notice the output lists only the HTML for a table. It does not format an entire HTML page. For this reason, this code could be used inside a JavaServer Page (JSP), a servlet, or any other Java application.

Only the Beginning

SAX is an excellent way to handle XML documents when only portions of the data are needed. It is an alternate—not a replacement—for the DOM approach. There are instances in which the whole XML document is needed and SAX will not suffice. SAX is available in most Java and C parsers and the most recent version of the Microsoft parser included in their Internet Explorer browser.

XML is the latest “hot” technology, but it is definitely here to stay. It provides a universal, platform-independent approach to describing data. It has been embraced by the development community and industry giants like IBM, Oracle, and Microsoft. In fact, Microsoft has made it an integral part of the next iteration of its Office suite.

REFERENCES AND RELATED MATERIALS

• Apache XML Project Web site: xml.apache.org

• IBM DeveloperWorks XML Zone: www.ibm.com/xml

• Java and XML. Brett McLaughlin and Mike Loukides. Cambridge, Massachusetts: O’Reilly and Associates, 2000

• The XML Companion. Neil Bradley. Addison-Wesley Publishing Co., 2000

• XML.com Web site: www.xml.com

• XML Developer Center:msdn.microsoft.com/xml

• XML in Action. William J. Pardi. Microsoft Press, 1999

• XML.org Web site: www.xml.org

• XML Pocket Reference. Robert Eckstein. Cambridge, Massachusettes: O’Reilly and Associates, 1999

• XML Programming with VB and ASP. Mark Wilson and Tracey Wilson. Greenwich, Connecticut: Manning Publications, 1999

Method Description

characters The character method is triggered when character data is encountered inside an element. endDocument The endDocument method is triggered at the end of an XML document.
startDocument The startDocument method is triggered at the beginning of an XML document.
error The error method is fired when a parser error is raised.
processingInstruction The processingInstruction method is triggered when a processingInstruction is encountered. startElement The startElement method is triggered at the beginning of an element in the XML document. endElement The endElement method is triggered at the ending tag of an element in the XML document.

Figure 1: The HandlerBase class contains a number of methods that are invoked when the XML parser encounters various portions of a document.


Domino Development With Java

1930110049

Practical LotusScript

1884777767


Figure 2: An XML document is a set of nested tags that describe the data that it contains.

//1.

import org.xml.sax.AttributeList;
import org.xml.sax.HandlerBase;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
//2.

public class saxExample extends HandlerBase {
//3.

private String chValue;
//4.

private String eName;
//5.

public void startDocument() throws SAXException {

System.out.println("Beginning of XML document.");

}

//6.

public void endDocument() throws SAXException {

System.out.println("End of XML document.");

}

//7.

public void startElement(String name, AttributeList amap)

throws SAXException {
//8.

eName = name;

System.out.println("Beginning of element: " + name);

}

//9.

public void endElement(String name) throws SAXException {
//10.

eName = "";

System.out.println("End of element: " + name);

}

//11.

public void characters(char[] ch, int start, int length)

throws SAXException {
//12.

chValue = new String(ch, start, length);
//13.

if (!chValue.trim().equalsIgnoreCase("")) {

System.out.println("Data in element (" + eName + "): " + chValue);

}

}

//14.

public void error(SAXParseException e){

System.out.println("Error has occurred.");

}

//15.

public void fatalError(SAXParseException e){

System.out.println("Fatal error has occurred.");

}

}

Figure 3: The saxExample class is a utility class that customizes the SAX handler methods of the HandlerBase parent class.

//1.

import org.xml.sax.SAXException;
//2.

import org.apache.xerces.parsers.SAXParser;
//3.

public class saxDemo {
//4.

private static String xmlSource = "c:ooks.xml";
//5.

public static void main(String argv[])

throws SAXException {
//6.

SAXParser parser = new SAXParser();

try {
//7.

saxExample saxTest = new saxExample();
//8.

parser.setDocumentHandler(saxTest);
//9.

parser.parse(xmlSource);
//10.

} catch (SAXException e) {

System.out.println("SAX Exception");

e.printStackTrace();

} catch (Exception e) {

e.printStackTrace();

}

}

}

Beginning of XML document.
Beginning of element: booklist
Beginning of element: book
Beginning of element: title
Data in element (title): Domino Development With Java
End of element: title
Beginning of element: isbn
Data in element (isbn): 1930110049
End of element: isbn
End of element: book
Beginning of element: book
Beginning of element: title
Data in element (title): Practical LotusScript
End of element: title
Beginning of element: isbn
Data in element (isbn): 1884777767
End of element: isbn
End of element: book
End of element: booklist
End of XML document.

Figure 4: The saxDemo class uses the saxExample utility class to list the contents of the books.xml file.

Figure 5: The book.xml file is listed as standard output by the saxDemo class (Figure 4) as parsing events are handled by the saxExample class (Figure 3).

import org.xml.sax.AttributeList;
import org.xml.sax.HandlerBase;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;

public class saxExample2 extends HandlerBase {

private String chValue;

//The beginning of an HTML table is printed at the start of the XML document.

public void startDocument() throws SAXException {

System.out.println(“

”);

}

//The ending HTML table tag is printed at the end of the XML document.

public void endDocument() throws SAXException {

System.out.println(“

”);

}

//A new HTML table row is started for each book element.

public void startElement(String name, AttributeList amap)

throws SAXException {

if (name.equalsIgnoreCase(“book”)) {

System.out.println(“”);

}

}

//The HTML table row is ended at the end of each book element.

public void endElement(String name) throws SAXException {

if (name.equalsIgnoreCase(“book”)) {

System.out.println(“”);

}

}

// Display each text value in a separate HTML table cell.

public void characters(char[] ch, int start, int length)

throws SAXException {

chValue = new String(ch, start, length);

if (!chValue.trim().equalsIgnoreCase(“”)) {

System.out.println(“” + chValue + “”);

}

}

}

import org.xml.sax.SAXException;
import org.apache.xerces.parsers.SAXParser;

public class saxDemo2 {

private static String xmlSource = “http://127.0.0.1/books.xml”;

public static void main(String argv[])

throws SAXException {

SAXParser parser = new SAXParser();

try {

saxExample2 saxTest = new saxExample2();

parser.setDocumentHandler(saxTest);

parser.parse(xmlSource);

} catch (SAXException e) {

System.out.println(“SAX Exception”);

}

}

}

Figure 7: The saxDemo2 class uses a URL instead of a file and the HTML output capabilities of saxExample2 (Figure 6) to Web enable the use of the book XML language.

Figure 6: The saxExample2 class outputs an HTML table.

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$0.00 Raised:
$

Book Reviews

Resource Center

  • SB Profound WC 5536 Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application. You can find Part 1 here. In Part 2 of our free Node.js Webinar Series, Brian May teaches you the different tooling options available for writing code, debugging, and using Git for version control. Brian will briefly discuss the different tools available, and demonstrate his preferred setup for Node development on IBM i or any platform. Attend this webinar to learn:

  • SB Profound WP 5539More than ever, there is a demand for IT to deliver innovation. Your IBM i has been an essential part of your business operations for years. However, your organization may struggle to maintain the current system and implement new projects. The thousands of customers we've worked with and surveyed state that expectations regarding the digital footprint and vision of the company are not aligned with the current IT environment.

  • SB HelpSystems ROBOT Generic IBM announced the E1080 servers using the latest Power10 processor in September 2021. The most powerful processor from IBM to date, Power10 is designed to handle the demands of doing business in today’s high-tech atmosphere, including running cloud applications, supporting big data, and managing AI workloads. But what does Power10 mean for your data center? In this recorded webinar, IBMers Dan Sundt and Dylan Boday join IBM Power Champion Tom Huntington for a discussion on why Power10 technology is the right strategic investment if you run IBM i, AIX, or Linux. In this action-packed hour, Tom will share trends from the IBM i and AIX user communities while Dan and Dylan dive into the tech specs for key hardware, including:

  • Magic MarkTRY the one package that solves all your document design and printing challenges on all your platforms. Produce bar code labels, electronic forms, ad hoc reports, and RFID tags – without programming! MarkMagic is the only document design and print solution that combines report writing, WYSIWYG label and forms design, and conditional printing in one integrated product. Make sure your data survives when catastrophe hits. Request your trial now!  Request Now.

  • SB HelpSystems ROBOT GenericForms of ransomware has been around for over 30 years, and with more and more organizations suffering attacks each year, it continues to endure. What has made ransomware such a durable threat and what is the best way to combat it? In order to prevent ransomware, organizations must first understand how it works.

  • SB HelpSystems ROBOT GenericIT security is a top priority for businesses around the world, but most IBM i pros don’t know where to begin—and most cybersecurity experts don’t know IBM i. In this session, Robin Tatam explores the business impact of lax IBM i security, the top vulnerabilities putting IBM i at risk, and the steps you can take to protect your organization. If you’re looking to avoid unexpected downtime or corrupted data, you don’t want to miss this session.

  • SB HelpSystems ROBOT GenericCan you trust all of your users all of the time? A typical end user receives 16 malicious emails each month, but only 17 percent of these phishing campaigns are reported to IT. Once an attack is underway, most organizations won’t discover the breach until six months later. A staggering amount of damage can occur in that time. Despite these risks, 93 percent of organizations are leaving their IBM i systems vulnerable to cybercrime. In this on-demand webinar, IBM i security experts Robin Tatam and Sandi Moore will reveal:

  • FORTRA Disaster protection is vital to every business. Yet, it often consists of patched together procedures that are prone to error. From automatic backups to data encryption to media management, Robot automates the routine (yet often complex) tasks of iSeries backup and recovery, saving you time and money and making the process safer and more reliable. Automate your backups with the Robot Backup and Recovery Solution. Key features include:

  • FORTRAManaging messages on your IBM i can be more than a full-time job if you have to do it manually. Messages need a response and resources must be monitored—often over multiple systems and across platforms. How can you be sure you won’t miss important system events? Automate your message center with the Robot Message Management Solution. Key features include:

  • FORTRAThe thought of printing, distributing, and storing iSeries reports manually may reduce you to tears. Paper and labor costs associated with report generation can spiral out of control. Mountains of paper threaten to swamp your files. Robot automates report bursting, distribution, bundling, and archiving, and offers secure, selective online report viewing. Manage your reports with the Robot Report Management Solution. Key features include:

  • FORTRAFor over 30 years, Robot has been a leader in systems management for IBM i. With batch job creation and scheduling at its core, the Robot Job Scheduling Solution reduces the opportunity for human error and helps you maintain service levels, automating even the biggest, most complex runbooks. Manage your job schedule with the Robot Job Scheduling Solution. Key features include:

  • LANSA Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.

  • LANSAWhen it comes to creating your business applications, there are hundreds of coding platforms and programming languages to choose from. These options range from very complex traditional programming languages to Low-Code platforms where sometimes no traditional coding experience is needed. Download our whitepaper, The Power of Writing Code in a Low-Code Solution, and:

  • LANSASupply Chain is becoming increasingly complex and unpredictable. From raw materials for manufacturing to food supply chains, the journey from source to production to delivery to consumers is marred with inefficiencies, manual processes, shortages, recalls, counterfeits, and scandals. In this webinar, we discuss how:

  • The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit

  • Profound Logic Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application.

  • SB Profound WC 5536Join us for this hour-long webcast that will explore:

  • Fortra IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn: