Thu, Dec
5 New Articles

RPG Has SAX Appeal!

  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

In this part of our RPG XML series, you'll learn how to use RPG's XML-SAX op-code to deal with problematic XML documents and handle situations that XML-INTO cannot deal with.


In the previous two articles in this series, "%Handling XML-INTO Problems" and "i5/OS Offers Native XML Support in V5R4", we focused on the capabilities of RPG's XML-INTO. As we saw, this op-code processes an entire document, either as a single piece or, when needed or desired, in "chunks" by using the capabilities of the %HANDLER BIF. There are, however, situations when this will not work for you. This often relates to limitations in RPG's data structure (DS) capabilities. As you know, a named DS is limited to a maximum size of 64K (at least until V6R1 anyway). Suppose that even a single repeating element will not fit into this? That may sound unlikely, but it doesn't take a huge number of repeating text fields to exceed this limit. Another example, and one that seems to occur quite often, arises when your XML document contains a structure that simply cannot be represented in an RPG DS. To illustrate this, take a look at the new version of our XML document, shown below:



<Category Code="02">


<Product Code="1234">

(A) <Description type="short">Two slot chrome</Description>

(B) <Description type="long">This beautiful two slot chrome finished toaster is

a perfect complement to any modern kitchen ...</Description>





<Product Code="2345">

<Description type="short">Four slot matt black</Description>






<Category Code="14">

<CatDescr type="short">Coffee Makers</CatDescr>

<Product Code="9876">

<Description>10 cup auto start</Description>


It is substantively the same as in our previous examples, but with one very significant exception: The <Description> element can now be repeated. If that were the only difference, then we could accommodate it by adding a DIM( ) keyword to the element's definition in the DS. But notice that not only does the element repeat, but there is also a new attribute, type, which is used to indicate the type of description (short or long) that is being defined. This presents us with a problem. Since an attribute is treated in the same way as a child element of the parent, the correct RPG definition for "type" would be this:



d description DS

d type 5a



But this leaves us with nowhere to put the content of the description since the content of a DS is the sum of its subfields and any data placed there would overwrite those subfields. In other words, in our situation, the description would overwrite the type field (or vice versa). Not a lot of help! In theory, a DS that looks like the one below should solve the problem:


d description DS Qualified Dim(2)

d description 1000a Varying

d type 5a


In this case, the <Description> would be stored in the field description.description and the "type" attribute would be stored in description.type. Makes sense, doesn't it? Maybe to you, but sadly, not to the compiler.


IBM is aware of this deficiency, and it is on their "to-do" list, but don't expect to see it in V6R1. And don't hold me to it working the way I have described it here; IBM may well have other ideas.


So if we cannot create a DS that matches the structure of the XML data, then we cannot use XML-INTO or at least cannot use it for the whole task. So what are our options?


There are effectively three options:


  • The first is to take advantage of RPG's XML-SAX op-code. This can be used either by itself to process the entire document or as a follow-on to an XML-INTO parse to "fill in the gaps." We will be dealing with the usage of XML-SAX in the balance of this article.


  • The second is to reformat the document by using an XSL transform so that it is in a format that can be expressed in RPG terms. This is the approach recommended in the IBM Redbook The Ins and Outs of XML and DB2 UDB for i5/OS. If you have the required XSL skills or are prepared to develop them, this is certainly a valid option and can also help to deal with other issues, such as empty elements. Since the Redbook provides a good working example, we won't duplicate that work here.


  • Another option would be to process the document in two passes using XML-INTO with a different target DS on each pass. You would also need to use the "AllowExtra" and "AllowMissing" processing options in order to persuade the parser to handle the document since neither of the DSs will exactly match the document. This is not as effective as the XML-SAX option, so we will not be discussing it further.


The operation of XML-SAX is very different from that of XML-INTO. XML-INTO parses the data from many elements at a time and places the parsed content into the appropriate field in the target DS or array. XML-SAX on the other hand parses the document one event at a time. Examples of events include the beginning of an element (i.e., its starting tag), the value of an element, the end of an element (i.e., its ending tag), the name of an attribute, the value of the attribute, etc.


With XML-INTO, the use of a handler procedure is optional, but with XML-SAX %HANDLER must always be specified. Your handler procedure will be called for every event that the parser encounters. It is up to your logic to decide if it should simply ignore the event or react to it in some way.


Logic is needed in the handler to recognize and react to the beginning of each element and attribute and to store the values in the appropriate places. You will perhaps get a better idea of the kind of logic that might be required if you study the list below. It represents the sequence of events and the associated data (in parentheses) that would be passed to the handler when processing the section of the XML document that begins at (A) above and ends at (B).


• Start Element (description)

• Attribute Name (type)

• Attribute Characters (short)

• End Attribute (type)

• Element Characters (two-slot chrome)

• End Element (description)


Notice that when we receive the element and attribute data, we have no idea which element/attribute it belongs to. That is up to us to determine. In fact, this is not a difficult task as the data will always belong to the last element/attribute that began but has not yet ended. With so many events being signaled to your handler, you can no doubt see that writing the logic to completely process even a simple document with XML-SAX would be somewhat tedious, requiring a lot of rather repetitive code. Luckily, we rarely require all of the data in a document, and we also have the option to combine XML-SAX with XML-INTO to simplify our task.


So to handle the situation in our example, that is what we will do. We will use XML-INTO to capture the bulk of the data and then process again using XML-SAX to fill in the missing piece: the type codes associated with the descriptions.


Let's look at the code that achieves this (shown at the end of this article).


The first thing to notice is the change in the product DS (A). Notice that we have made the description field an array with two elements and also added the type field as a two-element array. Note that the name of the type field in the DS (descrType) does not match the name of the attribute (type) to ensure that XML-INTO will not try to populate it and to make that fact more obvious to those who come after us. In fact, there is no need to actually include the type in the DS at all, but it is convenient to keep all the data together.


The XML-INTO must have the "allowextra=yes" option specified (B) to accommodate the extra type fields. Without this option, the parse would fail since the new version of the DS no longer corresponds to the XML document. Once XML-INTO has completed, we invoke XML-SAX (C) to reprocess the document.


There is no difference in the definition of %HANDLER, but there is a difference between the information passed to an XML-SAX handler and the information passed to the XML-INTO handler we saw in the last article. Take a look at the prototype at (D) and you will see what I mean. The only parameter that is common to the two versions is the first one, the Communication Area. The remaining parameters are as follows:


event is a four-byte integer that identifies the type of event being processed. Don't worry about the fact that the event is identified by a number. As you will see later, RPG supplies a number of named constants that can be compared with the event value.


pstring is a pointer to the beginning of the string containing the event data (e.g., the element/attribute names or data).


stringLen is the length of the string "pointed to" by the previous parameter. This length must be used to determine if data is present as there are occasions when a valid pointer is passed even though there is no data. Only the number of characters indicated by this parameter should be processed.


exceptionId is an error code identifying any error passed to the handler by the parser. We will not be discussing this in this article. Check the RPG manuals for more information.


Having seen the parameters passed to the handler, it is time to study the mechanics of the handler procedure MySAXHandler. The first step (E) is to check whether any data was received. If no data is received, then the handler simply returns control to the parser. If data is present, then the procedure RmvWhiteSpace( ) is called to remove any unwanted characters and reduce them to a single space. We will look at what I mean by "unwanted" in a moment. Notice that %SUBST is used to pass only the valid portion of the data to the subprocedure. Remember, we were passed only a pointer and a length, and there is probably other data beyond the point indicated by the length parameter. It is worth noting at this point that the field string, which is based on the pointer, can be very useful during debug. If you display it, you will usually be able to see not only the data you are about to process, but also the next part of the XML document. In other words, you will know what to expect next and can perhaps set appropriate breakpoints. This is not guaranteed as sometimes the pointer references a work area, but it is worth remembering.


What do we mean by "unwanted" and why do we need the RmvWhiteSpace routine? Because carriage returns, new lines, tabs, and excess spaces are often present in XML data (sometimes to make it look "pretty"), and we need to remove them from the data. We will not be studying the detail of this procedure, but you will find it included in the version of the program that is available for download. Hopefully, its operation is self-explanatory. (Many thanks to IBM Toronto's Barbara Morris for supplying this routine.)


At (F), the real work begins. A SELECT group is used to identify the type of event we are handling; this is where the named constants mentioned earlier come into play. For example, *XML_START_ELEMENT represents the event code that announces the arrival of a new element name. In the SELECT group at (G), we then identify the specific element that we are dealing with and process accordingly. All this logic is really doing is setting up the appropriate array indices for the Category, Product, and Description arrays. Since we know that the document we are processing is the same one that we just parsed with XML-INTO, we can afford to short-circuit the process, so no attempt is made to match the product codes with the descriptions or anything.


If the event does not represent the beginning of an element, then we next test to see if it is an attribute name (H). If it is, we check to see if it is the type attribute, and if so, we turn on the waitingForType indicator. This indicator allows us to associate the attribute data when it arrives (I) as belonging to the type attribute. Remember, we said earlier that it is up to us to determine that. We then store the value for the type attribute in the appropriate descrType array element.


After processing the document, the XML-SAX parse completes and control returns to the program's main line at (J). At this point, the complete content of the XML document has been stored in our category DS, so our program can process or store that data as necessary. In this simple example, we will just display the data. The logic simply loops through all of the categories and products. As in our previous example, the category loop is controlled by the RPG-supplied xmlElements count in the Program Status Data Structure, which was populated by the XML-INTO operation, and the product loop completes when a blank product code is encountered. The format of our XML document is such that there must be a short description, so the first elements of the description and type arrays are displayed. At (K), the logic then tests to see if a second set is present and, if it is, displays the relevant data.


And that's really all there is to it. I won't describe it here, but I have included in the source code accompanying this article a utility program (XMLSAXLIST) that you might find useful when studying XML documents that you need to process. It uses XML-SAX to parse the document and produces a listing of all the events signaled and the length and content of the associated data. If you run the program, you will be able to see the effect of the RmvWhiteSpace procedure as the original length of the data item is included. If you have any questions about the operation of the program, please let me know.


H Option(*NoDebugIO : *SrcStmt )


// This count is populated by XML-INTO whenever the INTO

// variable is an array

D progStatus SDS

D xmlElements 20i 0 Overlay(progStatus: 372)


(D) D MySAXHandler Pr 10i 0

D commArea Like(dummyCommArea)

D event 10i 0 Value

D pstring * Value

D stringLen 20i 0 Value

D exceptionId 10i 0 Value


D RmvWhitespace pr 65535a Varying

D input 65535a Varying Const


D category DS Qualified Dim(20)

D code 2a

D catDescr 20a

D product LikeDS(product) Dim(50)


D product DS Qualified

D code 4a

(A) D descrType 5a Dim(2)

D description 600a Dim(2)

D mSRP 7p 2

D sellPrice 7p 2

D qtyOnHand 5i 0


D XML_Source S 256a Varying

D Inz('/Partner400/XML/Example5.xml')


// Short version of Description for display purposes

D dispDescription...

D S 40a


D dummyCommArea S 1a

D i S 5i 0

D p S 5i 0




(B) XML-INTO category

%XML(XML_Source: 'case=any doc=file allowextra=yes +



// XML-INTO has filled the category array

// Next we use XML-SAX to fill in the missing type details

(C) XML-SAX %HANDLER(MySAXHandler: dummyCommArea)

%XML(XML_Source: 'doc=file');


Dsply ('xmlElements = ' + %char(xmlElements) );


// The XML parser's element count is used to control the loop

(J) For i = 1 to xmlElements;

Dsply ('Cat: ' + category(i).code + ' ' +

category(i).catDescr );

For p = 1 to %Elem(category.product);

If category(i).product(p).code = *Blanks;

Leave; // Exit once blank product code entry located


// Process the current product entry

dispDescription = category(i).product(p).description(1);

Dsply ('Product: ' + dispDescription);

Dsply ('Type: ' + category(i).product(p).descrType(1));


// If second description is present, display details

(K) If category(i).product(p).description(2) <> *Blanks;

dispDescription = category(i).product(p).description(2);

Dsply ('Product: ' + dispDescription);

Dsply ('Type: ' + category(i).product(p).descrType(2));







Jon Paris

Jon Paris's IBM midrange career started when he fell in love with the System/38 while working as a consultant. This love affair ultimately led him to joining IBM.


In 1987, Jon was hired by the IBM Toronto Laboratory to work on the S/36 and S/38 COBOL compilers. Subsequently, Jon became involved with the AS/400 and in particular COBOL/400.


In early 1989, Jon was transferred to the Languages Architecture and Planning Group, with particular responsibility for the COBOL and RPG languages. There, he played a major role in the definition of the new RPG IV language and in promoting its use with IBM Business Partners and users. He was also heavily involved in producing educational and other support materials and services related to other AS/400 programming languages and development tools, such as CODE/400 and VisualAge for RPG.


Jon left IBM in 1998 to focus on developing and delivering education focused on enhancing AS/400 and iSeries application development skills.


Jon is a frequent speaker at user group meetings and conferences around the world, and he holds a number of speaker excellence awards from COMMON.



Support MC Press Online

$0.00 Raised:

Book Reviews

Resource Center

  • SB Profound WC 5536 Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application. You can find Part 1 here. In Part 2 of our free Node.js Webinar Series, Brian May teaches you the different tooling options available for writing code, debugging, and using Git for version control. Brian will briefly discuss the different tools available, and demonstrate his preferred setup for Node development on IBM i or any platform. Attend this webinar to learn:

  • SB Profound WP 5539More than ever, there is a demand for IT to deliver innovation. Your IBM i has been an essential part of your business operations for years. However, your organization may struggle to maintain the current system and implement new projects. The thousands of customers we've worked with and surveyed state that expectations regarding the digital footprint and vision of the company are not aligned with the current IT environment.

  • SB HelpSystems ROBOT Generic IBM announced the E1080 servers using the latest Power10 processor in September 2021. The most powerful processor from IBM to date, Power10 is designed to handle the demands of doing business in today’s high-tech atmosphere, including running cloud applications, supporting big data, and managing AI workloads. But what does Power10 mean for your data center? In this recorded webinar, IBMers Dan Sundt and Dylan Boday join IBM Power Champion Tom Huntington for a discussion on why Power10 technology is the right strategic investment if you run IBM i, AIX, or Linux. In this action-packed hour, Tom will share trends from the IBM i and AIX user communities while Dan and Dylan dive into the tech specs for key hardware, including:

  • Magic MarkTRY the one package that solves all your document design and printing challenges on all your platforms. Produce bar code labels, electronic forms, ad hoc reports, and RFID tags – without programming! MarkMagic is the only document design and print solution that combines report writing, WYSIWYG label and forms design, and conditional printing in one integrated product. Make sure your data survives when catastrophe hits. Request your trial now!  Request Now.

  • SB HelpSystems ROBOT GenericForms of ransomware has been around for over 30 years, and with more and more organizations suffering attacks each year, it continues to endure. What has made ransomware such a durable threat and what is the best way to combat it? In order to prevent ransomware, organizations must first understand how it works.

  • SB HelpSystems ROBOT GenericIT security is a top priority for businesses around the world, but most IBM i pros don’t know where to begin—and most cybersecurity experts don’t know IBM i. In this session, Robin Tatam explores the business impact of lax IBM i security, the top vulnerabilities putting IBM i at risk, and the steps you can take to protect your organization. If you’re looking to avoid unexpected downtime or corrupted data, you don’t want to miss this session.

  • SB HelpSystems ROBOT GenericCan you trust all of your users all of the time? A typical end user receives 16 malicious emails each month, but only 17 percent of these phishing campaigns are reported to IT. Once an attack is underway, most organizations won’t discover the breach until six months later. A staggering amount of damage can occur in that time. Despite these risks, 93 percent of organizations are leaving their IBM i systems vulnerable to cybercrime. In this on-demand webinar, IBM i security experts Robin Tatam and Sandi Moore will reveal:

  • FORTRA Disaster protection is vital to every business. Yet, it often consists of patched together procedures that are prone to error. From automatic backups to data encryption to media management, Robot automates the routine (yet often complex) tasks of iSeries backup and recovery, saving you time and money and making the process safer and more reliable. Automate your backups with the Robot Backup and Recovery Solution. Key features include:

  • FORTRAManaging messages on your IBM i can be more than a full-time job if you have to do it manually. Messages need a response and resources must be monitored—often over multiple systems and across platforms. How can you be sure you won’t miss important system events? Automate your message center with the Robot Message Management Solution. Key features include:

  • FORTRAThe thought of printing, distributing, and storing iSeries reports manually may reduce you to tears. Paper and labor costs associated with report generation can spiral out of control. Mountains of paper threaten to swamp your files. Robot automates report bursting, distribution, bundling, and archiving, and offers secure, selective online report viewing. Manage your reports with the Robot Report Management Solution. Key features include:

  • FORTRAFor over 30 years, Robot has been a leader in systems management for IBM i. With batch job creation and scheduling at its core, the Robot Job Scheduling Solution reduces the opportunity for human error and helps you maintain service levels, automating even the biggest, most complex runbooks. Manage your job schedule with the Robot Job Scheduling Solution. Key features include:

  • LANSA Business users want new applications now. Market and regulatory pressures require faster application updates and delivery into production. Your IBM i developers may be approaching retirement, and you see no sure way to fill their positions with experienced developers. In addition, you may be caught between maintaining your existing applications and the uncertainty of moving to something new.

  • LANSAWhen it comes to creating your business applications, there are hundreds of coding platforms and programming languages to choose from. These options range from very complex traditional programming languages to Low-Code platforms where sometimes no traditional coding experience is needed. Download our whitepaper, The Power of Writing Code in a Low-Code Solution, and:

  • LANSASupply Chain is becoming increasingly complex and unpredictable. From raw materials for manufacturing to food supply chains, the journey from source to production to delivery to consumers is marred with inefficiencies, manual processes, shortages, recalls, counterfeits, and scandals. In this webinar, we discuss how:

  • The MC Resource Centers bring you the widest selection of white papers, trial software, and on-demand webcasts for you to choose from. >> Review the list of White Papers, Trial Software or On-Demand Webcast at the MC Press Resource Center. >> Add the items to yru Cart and complet he checkout process and submit

  • Profound Logic Have you been wondering about Node.js? Our free Node.js Webinar Series takes you from total beginner to creating a fully-functional IBM i Node.js business application.

  • SB Profound WC 5536Join us for this hour-long webcast that will explore:

  • Fortra IT managers hoping to find new IBM i talent are discovering that the pool of experienced RPG programmers and operators or administrators with intimate knowledge of the operating system and the applications that run on it is small. This begs the question: How will you manage the platform that supports such a big part of your business? This guide offers strategies and software suggestions to help you plan IT staffing and resources and smooth the transition after your AS/400 talent retires. Read on to learn: