Fast XML with RPG IV and SAX

RPG
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

In my article, "RPG IV and XML Together," in the November 2000 issue of Midrange Computing, I detailed the steps necessary to create an XML document for a security-policy compliance auditing application and build an XML Document Object Model (DOM) tree as the internal representation of the security policy in memory. I showed how to programmatically navigate that tree to extract recommended policy value settings for security-related system values for the purpose of evaluating those recommendations against values on a target AS/400 and adding that rating to an output XML policy evaluation document for later analysis.

In that article, I briefly touched upon the performance disadvantages of this method when an XML document could be of indeterminate (i.e., large) size. In that instance, DOM would be a poor choice for the application in providing timely reporting. Building the XML DOM tree would consume an extraordinary amount of memory, CPU time, and other system resources. To overcome this limitation to XML document usage, early users (David Megginson and the developers on the xml-dev list) of the XML API invented a method for handling real-world applications called Simple API for XML (SAX) that did not require building an internal tree representation of the document in memory. The primary difference between using DOM and SAX, from a performance standpoint, is that DOM is extremely memory-intensive, while SAX is not.

In this article, I will show you how to use SAX to manipulate documents that cannot be predetermined to be of a fixed size optimal for processing using the XML DOM API, do not require the navigational precision of DOM, or otherwise manifest properties where DOM would not be the appropriate choice to maintain the XML document. In the example provided, you will learn how to use the SAX API to do the following:

  • Parse an XML document containing system-values policy requirements regarding security-related system values.
  • Extract recommended values from the policy document.
  • Rate the current AS/400 system value against recommended system values by using compile-time lookup ratings and add the rating to the document for the appropriate system value.
  • Write the new document back out to a designated AS/400 Integrated File System (AS/400 IFS) directory stream file.

How SAX Does It for You!

The actual mechanics of the SAX API involves a bit of handshaking between a scanner (application code, not the hardware), a parser, and a document handler. The scanner is a core part of the SAX API supplied by the vendor (in this case, IBM alphaWorks). The parser can be supplied by anyone conforming to the SAX API interface requirements but is typically supplied by the vendor supplying the scanner. The document handler is where you come in. The SAX API defines an interface (a set of procedure and method signatures) that must be implemented by the user that wishes to interact with XML documents. IBM alphaWorks has already implemented this interface with its C++ classes and has given you a set of procedures to interact with those classes as defined by the "XML for Procedural Language Interface" for download at www.alphaworks.ibm.com/tech/xml4rpg. Although the Web site calls the interface "XML Interface for RPG," it can be used by ILE C and COBOL as well. You can refer to my article in MC's November issue for more information.

As the scanner passes through an XML document, the scanner uses established callback events (defined by the XMLDocumentHandler interface) to pass data to the parser. The SAX API gives you access to XML document information by allowing you to register your interest in a set type of events and establishing procedures in your application that the SAX API can call back when the registered event is encountered by the document-scanning process. Data for those events are passed to the user application procedure defined (and registered) to handle the event as the parser passes the related node type. In the next section, I will show you how easy it is to use SAX to read in a large XML document and to create a new and larger XML document as output.

SAX in 21 Minutes and 10 Easy Steps

Whether your appreciation of SAX takes 21 minutes or not, I believe you will enjoy learning that SAX is easier than DOM. However, before you get into the example application, you need to review and understand some requisite details about the example.


FIGURE 1: DTD for Security Auditing Policy Recommendations & sample document.

The document type definition (DTD) shown in Figure 1 shows the expected format of the incoming XML security-policy source document that I use in my demonstrative SAX parsing program (SEC0002RS) and a sample XML document using that format as presented in an XML-enabled browser. Note the document can have either a single value or a range of recommended values.


FIGURE 2: DTD for Security Auditing Policy Evaluation Rating.

The DTD shown in Figure 2 shows the expected format of the new XML document (and a sample) that will be created as output by using callback events from the SAX parser. You can refer to my November 2000 article if you require a refresher on DTDs. Next, I will lead you through a 10-step process to write your own SAX parsing program.

Step 1: Set a Pointer to DOM Exceptions Feedback Area

The first step in creating an XML document with SAX is to get the pointer (Figure 3, Section D) to the Qxml_DOMExcData data structure. This data structure is defined in QXML4PR310 headers provided in the alphaWorks download and contains feedback information about the success or failure of SAX API calls. You must pass the pointer along to the QxmlInit function as shown in Figure 3, Section E before any SAX functions are called. While you will be using the SAX API to parse the incoming XML document, some high-level parsing errors are reported in the DOM exception data structure. To get more granular in your error handling, you will want to use the SAX error-handler interface that I will present later.


FIGURE 3: Using SAX to parse a large XML document.

Proper error handling can only be ensured by checking the return code (Qxml_DOMRtnCod) after a SAX API method executes and by taking appropriate action when an error occurs. A list of what each return code means is contained in the QxmlPR4310 copy member defined as integer constants.

Step 2: Create a New Instance of a SAX Parser

Next, you need to create an instance of the underlying SAX parser with the statement:

Eval SAXParse@ = QxmlSAXParser_new

This procedure takes no arguments and returns a pointer to the SAX parser instance just created. The parser instance is used to register callback events, set the related callbacks to application document and error handler procedures, and send the source input XML file to the parser pointed to by SAXParse@. This instance creation is presented in Figure 3, Section F.

Step 3: Set the Parser Validation Mode

In this step, the parser validation mode is set. Validation is the process of ensuring the format defined by the XML document type (and specified in the "" tag) is observed by the accompanying XML document details. In the exercise presented in this article, I assume a non-validating parser will be used. However, if you wish to implement the validating version, the DTDs and sample validation code are included in the downloadable source for this article.

CallP QxmlSAXParser_setDoValidation( pointerToSAXParser: validationMode)

To set the validation mode, all that is required is to pass two arguments to the QxmlSAXParser_setDoValidation procedure. The first argument is the pointer to the SAX parser instance you created in Step 1. The second argument is an integer value of zero for non-validating and one for the validating parser. This is illustrated in Figure 3, Section G.

Step 4: Create a Document-Handler Instance for the Current Parser

Next, create an instance of an underlying document handler that will use the pointers to callback procedures (discussed shortly), shown in Figure 3, Section B in the example program, to respond to SAX events.

Eval DocHndlr@ = QxmlDocumentHandler_new

To accomplish this, call the QxmlDocumentHandler_new procedure that takes no arguments and returns a pointer (DocHndlr@) to the document handler, as shown in Figure 3, Section H. You will use this pointer to register callback events and related procedures.

Step 5: Create an Error-Handler Instance for the Current Parser

Next, you will create an instance of an underlying error handler that will use the pointers to callback procedures (discussed shortly), shown in Figure 3, Section B in the example program, to respond to SAX exception events.

Eval ErrHndlr@ = QxmlErrorHandler_new

To accomplish this, call the QxmlErrorHandler_new procedure that takes no arguments and returns a pointer (ErrHndlr@) to the error handler, as shown in Figure 3, Section I. You will use this pointer to register what procedure in your application handles each of the three types of SAX error events.

Step 6: Connect the Document Handler to the Current Parser

During this step, you will establish a connection between the parser instance created in Step 2 and the document-handler instance created in Step 4.

CallP QxmlSAXParser_setDocumentHandler
          (pointerToSAXParser:
          pointerToDocumentHandler)

When the parser, pointed to by the first argument, gets a document event for which a procedure has been registered, the parser uses the pointer in argument 2 to find the document handler and get a pointer to the callback procedure in the application that handles that event. The code in the sample application to set the document handler is shown in Figure 3, Section H.

Step 7: Connect the Error Handler to Parser

This step is when you will establish a connection between the parser instance created in Step 2 and the instance of the error handler created in Step 5.

CallP QxmlSAXParser_setErrorHandler
          (pointerToSAXParser:
          pointerToErrorHandler)

When the parser, pointed to by argument 1, gets an error event for which a procedure has been registered, the parser uses the pointer in argument 2 to find the error handler and get a pointer to the callback procedure in the application that handles that error type event. The code in the sample application to set the error handler is shown in Figure 3, Section I.

Step 8: Set Callback Procedures Each of Three Error Events

The error handler, created in Step 5 and connected to the parser in Step 7, will deal with each of the following exception events:

  • Qxml_WARNINGHNDLR = warning
  • Qxml_ERRORHNDLR = error
  • Qxml_FATALERRORHNDLR = fatalError

To delegate handling for each error type event to user application procedures, the following procedure is called to establish the procedure to be called in the application in the event the related error occurs.

CallP QxmlErrorHandler_setCallback
          (pointerToErrorHandler:
          numericConstantForErrorType:
          pointerToCallbackProcedure)

The QxmlErrorHandler_setCallback procedure is called once for each of the three error types indicated above to establish a procedure in your application that will handle warnings, errors, and fatal-error conditions. You can supply three different procedures or use the same procedure to handle all three types. If you use a common procedure to handle all error events, you will have to interrogate the DOM and SAX exception data structures to determine the severity and nature of the exception. The error callback procedures must conform to the procedure interface defined by the prototypes shown in Figure 3, Section C and are passed only a pointer to the error string. The QxmlErrorHandler_setCallback procedure takes the following:

  • A pointer to the error handler in the first argument.
  • One of the three numeric literal names above (i.e., Qxml_WARNINGHNDLR, Qxml_ERRORHNDLR, or Qxml_FATALERRORHNDLR) representing the related numeric constants that indicate the error type for which the callback is being set as the second argument.
  • A pointer to the callback procedure in your application that will handle this error event type as the last argument.

    In the sample application, this step is done in Figure 3, Section O.

Step 9: Set the Callback Procedure for Each Required SAX Event

Next, you must set a callback to a procedure in your application for each SAX event you want to respond to.

CallP QxmlDocumentHandler_setCallback
                   (pointerToDocumentHandler:
                   numericConstantForEventType:
                   pointerToCallbackProcedure)


FIGURE 4: Table associating call event numeric literals with constant values in QXML4PR310 source member.

The QxmlDocumentHandler_setCallback procedure is called once for each SAX event you wish to handle as identified in Figure 4 by the numeric literal that translates to the proper numeric constant identifier for the event type. This must be a different procedure for each event type you wish to monitor and must accommodate the procedure interface requirements for the event type, as shown in Figure 5 and illustrated in Figure 3, Section A. This procedure takes the following:

  • A pointer to the document handler in the first argument.
  • One of the nine numeric literal names above representing the related numeric constants that indicate the document event type for which the callback is being set as the second argument. A pointer to the callback procedure in your application that will
  • handle this document event type as the last argument.

In the sample application, this is done in Figure 3, Section J for the start- and end-document events, Section K for the start- and end-element events, Section L for the characters event, Section M for the processing instructions event, and Section N for the ignorable whitespace event.


FIGURE 5: Interface for callback procedures.

Step 10: Parse the XML Document Using a SAX Parser

Finally, to start the parsing process, your application must call the QxmlSAXParser_parse_SystemId procedure:

CallP QxmlSAXParser_parse_SystemId
          (pointerToSAXParser:
          pointerToSourceInputXMLFile:
          documentEncodingSchemeConstant:
          0)

This procedure initiates the parsing process, and control is turned over to the parser pointed to in the first argument (SAXParse@). The second argument identifies a pointer (XmlFile@) to the source input XML document to be parsed. The third argument identifies the character-encoding scheme (Qxml_CCSID37) for the document, and the last argument tells the parser that the pointer in argument 2 points to an XML file name that is a null-terminated string. This step is shown in the example in Figure 3, Section Q.

Putting Callbacks to the Test

One of the major differences of the DOM API application discussed in my November 2000 article versus the SAX application discussed in this article is that since you are not building the document in memory (using DOM), you must rebuild the input document from all SAX events, as you go, and write the document to your new output file. You can't simply insert the new element to add the rating. The XML tags act as triggers for the characters' extraction process that will be the source of your new document. To write this data out to a stream file, I will use the same stream file procedures that I used in the DOM API article to accomplish creating the new XML document with earned points.

Now, reviewing the source input XML document of Figure 1, I will walk through the SAX parsing process for the first branch (QPWDEXPITV). The root node, or what Figure 4 identifies as the StartDocument event, instigates the first callback event the parser encounters. This is the tag shown in Figure 1. This causes the startDocument procedure shown in Figure 3, Section S to be called where the tag is recreated for the output XML document. Next, upon encountering the top-level element , the scanner instigates the StartElement event. This causes the parser to issue a callback to the startElement procedure shown in Figure 3, Section X in the SEC0002RS application. As shown in the procedure interface for this function and the table in Figure 5, this function has two arguments passed to it:

  • A pointer to the element name (Name@).
  • A pointer to a list of any attributes for the element (Attr@).

In Figure 3, Section Y of this procedure, the Name@ pointer is passed to another procedure (not shown in Figure 3) called getName that returns a pointer (OutString@) to a character string encoded with CCSID 37 (USA English) converted from the DOMString representation (which is essentially Unicode) of the underlying object created by SAX parser. The OutString@ pointer is used to extract the element name with the %str function in that section for building the start element tag in the output document. Figure 3, Section Z introduces a new function to this application called QxmlAttributeList_getLength that retrieves the number of attributes, if any, associated with the current element under the scanner given a pointer (Attr@) to the attribute list. This is followed by an iteration or loop shown in Figure 3, Section AA that extracts the attribute name (with the QxmlAttributeList_getName_byIndex function) and the related attribute value (with the QxmlAttributeList_getValue_byIndex function) to construct the attribute phrase of the form attribute name="value" that becomes a part of the element tag of the form .

Next, the element is encountered and passed to the StartElement callback. Then the element is encountered and once again the StartElement callback is called. The scanner encounters a string of characters (QPWDEXPITV) which causes the parser to issue a callback the procedure identified to respond to the Characters event, as shown in Figure 3, Section V. Two arguments, a pointer to the character string and a pointer to an integer representing the length of the character string, are passed to the Characters procedure shown. With the pointer to the character string, the application calls the getName function again to convert the encoding scheme from a Unicode representation (DOMString) to a character string encoded as CCSID 37, shown in Figure 3, Section W. The QxmlTranscode function, discussed in my November 2000 article, is called to accomplish this encoding conversion. (For the discussion of the usage of this function, refer to that article.)

Navigating through the XML document shown in Figure 1, the scanner passes over its first end tag, the end element tag. Accordingly, the parser issues a callback to the applications endElement procedure shown in Figure 3, Section BB. There the procedure accepts only one argument that represents a pointer to the tag element name for which an end has been found. In Section CC of that procedure, a test is made to determine if a valid system value grouping has preceded this element and if the end tag is for the Points element. If so, the procedure takes the previously saved system value and does a lookup in the compile-time table to retrieve the number of earned points for this policy value. As I mentioned earlier in this article, to simplify the illustration, I have not complicated it with an involved algorithm for determining the rating--just a simple lookup. But, since you have not encountered the Points tag, the procedure simply creates an ending tag and writes the tag out to a stream file, as shown in Figure 3, Section BB, with the call to the QxmlWriteOutPutStream. This function takes the following argument:

  • A pointer (fd) to the output XML file identified opened with the call to QxmlOpenNewOutPutStream, shown in Figure 3, Section P.
  • A pointer (OutPutStr@) to the string representing the ending element tag.
  • An integer (InCodePage) containing the value for the character encoding scheme.
  • An integer representing the length of the character for the ending element.

The scanner next encounters the starting element tag for , which causes the parser to issue another call to the startElement callback procedure. After recreating this tag in the output XML document, the character string for Value is encountered, and the parser issues a call to the Characters procedure. To finish this element, the parser next encounters the ending element tag in the source input XML document for Value () that it writes to the output stream file. Then the starting element for Points is encountered. This causes the same sequence of steps to be executed as for previous elements and has the distinction of causing the EarnedPoints element to be written to the output XML document as detailed. Then the application returns control to the parser, and the scanner continues to the next SystemValue policy recommendation branch. Finally, the end of the document is reached, which causes the parser to issue a callback to the application procedure, endDocument, which as shown in Figure 3, Section U, does not take any arguments and does not do anything. However, you can implement any specific requirements your application may have upon encountering the end of the document. While neither processing instructions nor ignorable whitespace should be encountered in the source input XML document, I have provided an example of how those XML elements might be handled. The ignorableWhiteSpace procedure, shown in Figure 3, Section DD, takes two arguments:

  • A pointer (Char@) to the whitespace characters.
  • A pointer (Len@)to an integer holding the length of the whitespace character.

The processingInstructions procedure, shown in Figure 3, Section EE, takes two arguments:

  • A pointer (target@) to the character string representing the target for the processing instruction.
  • A pointer (data@) to the character string representing the data (parameters) to be passed to the target of the processing instruction.

Processing instructions force a call to be made to an external procedure for some arbitrary requirement. Processing instructions are somewhat frowned on by most XML developers because they do not have any relational significance to the structure of an XML document and can be placed anywhere in the document. Processing instructions are used more often by Extensible Stylesheet Language Transformations (XSLT) to use Extensible Stylesheet Language (XSL) to transform an XML document in one notation to an XML document in another lexical notation.

As with most every application of substance, there is usually a bit of housekeeping to be done before exiting. The example application accomplishes its cleanup work in Figure 3, Section R. There the application calls a series of XML API functions unique to IBM's implementation that are used to destroy underlying objects that have been used to support the XML DOM and SAX functions. The call to QxmlDocumentHandler_delete takes the pointer to the document handler, and

          CallP QxmlDocumentHandler_delete(DocHndlr@)

deletes the instance created in Step 4. The call to QxmlErrorHandler_delete takes the pointer to the error handler, and

          CallP QxmlErrorHandler_delete(ErrHndlr@)

deletes the instance created in Step 5. The call to QxmlSAXParser_delete takes the pointer to the SAX parser, and

          CallP QxmlSAXParser_delete(SAXParse@)

deletes the instance created in Step 2. The call to QxmlCloseOutPutStream takes the pointer to the stream file, and

          CallP QxmlCloseOutPutStream(fd)

closes the output stream file that was opened for the output XML document in Figure 3, Section P. With the call to QxmlTerm, the XML parsing environment is ended, and underlying support objects are destroyed and released from memory.

Running the SAX Parser Example

To run this application, issue the following command:

Call SEC0002RS Parm('/home/your_directory/sec0002n.xml' +
          '/home/your_directory/sec0002o.xml')

The first parameter to this command is the qualified file name for the input source document (sec0002n.xml) that you should get with the downloadable code. The second parameter is the qualified target file name for the output XML document (in this example, sec0002o.xml), which can be any name you choose.

This ends the tutorial. I hope you have enjoyed learning about SAX as much as I have enjoyed writing about it. Along with my November 2000 article, "RPGIV and XML Together," this article should give you yet another option in XML-enabling your RPG IV application and promote better integration with Web-based and multiplatform applications without requiring expensive middleware solutions.

Jim D. Barnes is an infrastructure architect at PentaSafe Security Technologies, Inc. in Houston, Texas. You may reach him at This email address is being protected from spambots. You need JavaScript enabled to view it..

REFERENCES AND RELATED MATERIALS

  • Applied XML: A Toolkit for Programmers. Alex Ceponkus and Faraz Hoodbhoy. John Wiley & Sons, Inc., 1999
  • Professional Java XML Programming with Servlets and JSP. Alexander Nakhimovsky and Tom Meyers. Wrox Press Inc., 1999
  • XML and Java: Developing Web Applications. Hiroshi Maruyama, Kent Tamura, and Naohiko Uramoto. Addison-Wesley, 1999
BLOG COMMENTS POWERED BY DISQUS