Before we start, I should mention that much of the XML support relies heavily on the compound data structure support added to RPG IV in the V5R2 release. I will be briefly covering the relevant aspects of this support during the course of this article, but for a more comprehensive overview, click here.
Let's start by taking a look at the simple XML document that we will use for our initial explorations. The document contains the details of products produced by our company, including the category, product code, description, selling price, and quantity in stock. The details are grouped by category and also include a description and code for each category. Hopefully, the layout of the document will be obvious from the short extract shown below:
<Description>Two slot chrome</Description>
<Description>Four slot matt black</Description>
<Description>10 cup auto start</Description>
RPG IV's most powerful XML support, the new XML-INTO opcode, operates by matching the names and hierarchy of the elements in the XML document to a matching data structure (DS) hierarchy in the program. So the first step is to build the required DS.
As you will note, the root of the XML document is an element named Products. So our starting point is to create a DS with this name (see label A in the code below). Note that I have used the keyword "Qualified" in defining this structure; the reason for this will become apparent in a moment. Next in the XML comes the element Category. If this were the only other element in the document, we could simply list it as a subfield in the DS, but as you can see, it is not. Category is a compound element containing both the category description and the details of all products within that category. Fortunately, V5R2 provided us with this capability in the form of the LIKEDS keyword. So, at label B, we simply define the subfield category as looking like the DS category! If you are unfamiliar with these new DS capabilities, this may seem strange to you; after all, RPG can't have two things with the same name, can it? This is where the keyword "Qualified" that I mentioned earlier comes into play. By adding this keyword to the definition of the DS, we changed the name of the category field to products.category. That is to say that the name category is qualified by the name of its parent DS. In fact, the use of the keyword "Qualified" is compulsory for any DS that contains the LIKEDS keyword in one of its subfield definitions.
Notice that I also added the keyword DIM(20) to the definition of products.category since Category is a repeated element. This ability to dimension a DS as an array, as opposed to being limited to the old multiple-occurrence data structures (MODS), is another feature of V5R2 and absolutely essential to being able to handle the nested elements contained within the vast majority of XML documents. In this case, our program makes the assumption that there will never be more than 20 categories. We will look later in this article series at how to handle situations where the potential number of elements takes us beyond RPG's current limits.
(B) D category LikeDS(category) Dim(20)
(C) D category DS Qualified
D description 20a
D code 2a
(D) D product LikeDS(product) Dim(50)
(E) D product DS Qualified
D description 40a
D code 4a
D mSRP 7p 2
D sellPrice 7p 2
D qtyOnHand 5i 0
D XML_Source S 256a Varying
Looking at the definition of the category DS (C) you can see that we have defined three fields, the first (description) matches the Description element. The second (code) is less obvious, but it's simply the result of the way XML treats attributes. Attributes of a compound element are considered to be at the same hierarchical level as any elements within it. If that sounds complicated, maybe this will make it a little more obvious. This code...
...is treated in XML as being equivalent to this code:
The third entry (product) in the structure will be used to represent the repeated element Product and is therefore represented as a nested array DS (D). The actual definition of the DS is shown at label E.
Although it may seem strange to you at first, no actual data will be stored in the category (C) or product (E) data structures. They exist only so that they can be referenced via the LIKEDS keywords. IBM has indicated that in the V6R1 release, a new keyword will be added to the language to allow us to indicate directly that such DSes are to be used only as templates, not for data storage. In the meantime, if you don't want to "waste" the memory they occupy, you can simply add the keyword BASED to their definition. This indicates to the compiler that you will later set a pointer to indicate where in memory this DS actually resides. But you don't actually set the pointer since you will never (deliberately!) reference the fields in these DS. When I use this technique, I tend to code it like so:
I do this in the hope that my fellow programmers will understand the intention behind the subsequent definitions.
So how do we reference (for example) the selling price of the first product in the first category? Simple: products.category(1).product(1).sellPrice gives us what we need! Yes, I know! I can hear it now: "Hey, Jon, that's a lot of typing!" There are many responses to such a statement, one of the more polite being "Yes, it is. Get over it!" But rather than respond directly, I will instead pose you a question: If you couldn't reference the field in this way, just how many lines of code would you have to type to be able to reference this field in any other way?
Of course, if you are using a decent editor, such as WDSC, then it really is not a problem as the code-assist function (Ctrl+Space) can pop up a list of candidate fields whenever you need it. You simply need to select the appropriate field from the list. Who said long field names require more typing! Speaking of WDSC, if you are still having problems understanding exactly how the data in the products DS is arranged, perhaps this screen shot of the WDSC outline view of the program will make things a little clearer.
Figure 1: The WDSC outline view shows how the data in the products DS is arranged.
OK, so we have finally completed the DS required to map the XML document content. All that remains is to code the operations to actually parse the document and fill up the elements. Luckily, this is the easy part. The simple XML-INTO operation shown below (G) does the entire job.
(G) XML-INTO products %XML(XML_Source: 'doc=file case=any');
The first operand of the op-code identifies the DS products as the target for the operation. The second operand is the new %XML built-in function (BIF), and it serves two purposes. First, it identifies the XML document via its first parameter (XML_Source), and second, it supplies processing options to the XML parser. In our example, we have specified two options.
The first, doc=file, informs the parser that the first parameter contains the name of the IFS file holding the XML document. As you can see at label F in our example, this is the fully qualified path name of the XML document. If this option is not supplied, the parser assumes that the field identified by the parameter actually contains the complete XML document.
The second option, case=any, specifies that element names in the document should be converted to uppercase before being compared to the names in the RPG DS. Other options include case=lower and case=upper. The case=upper option provides the best performance as it says that the element names are already in uppercase and therefore need no conversion. However, you will probably only be able to use this option if you control the definition of the XML schema. You can probably guess what case=lower means. Yup, the element names are all in lowercase and should be converted to uppercase. Although in theory this should perform better than case=any, in practice this is not true for reasons that I won't go into here.
That's all there is to it. This one simple little opcode does all of the heavy lifting for us, and once control is returned to our program, all of the data in the XML document has been parsed and placed in our products DS with numeric conversion where appropriate.
Well, actually it is far more likely that the program would have halted with an error message. Why? Because the version of the program as it stands will work only if there are a minimum of 20 different Category entries and if each of those contains 50 Product elements! Needless to say, such rigid conditions are unlikely to occur very often.
So how do we handle a situation in which the XML document contains fewer than the declared number of elements? The first thing we need to do is to add another entry to the %XML option list. The one that we need in this case is 'allowmissing=yes'. This tells the parser that it is acceptable if the document does not contain the exact number of elements that we identified in the array definition. However, there is a problem with this approach. The "allowmissing" option does not provide any degree of control. There is no way to say that we expect to get between one and 20 Category entries but that, for each Category, the Description element must be present. (Note: This type of error can be avoided by validating the XML document against its schema, but currently that has to be a separate operation outside of the RPG program. For more information on one approach to achieving this, see the Redbook The Ins and Outs of XML and DB2 for i5/OS.
The result is that once I use this option, the parser will be perfectly happy if almost anything is missing! Luckily, there is a relatively simple way to deal with this: initialize the entire DS to a known value...say, *HIVAL...before we begin the parse. We can then simply test any compulsory fields to ensure that they were correctly populated. We can also use this value as a means of determining when the last entry in the array has been processed.
Before I close, there are two other aspects of the XML-INTO support I should briefly touch on. The first concerns the handling of numeric fields. In my example, I defined several elements (for example, MSRP) as being numeric. While this works beautifully, in fact you can even specify the H(alf Adjust) extender with the XML-INTO opcode; be aware that should an error occur during the numeric conversion, the parser will simply terminate and issue an error. Unless you can guarantee that the XML document contains only "good" data, you might want to take an alternative approach and define all numeric elements as character fields in the DS. You can then attempt the numeric conversion (probably using the %DEC BIF) under the control of your own program. This way, you can report any errors encountered but still be able to process the balance of the document. A suggested version of the revised product DS is shown below:
D description 40a
D code 4a
D mSRP 12a
D sellPrice 12a
D qtyOnHand 7a
Don't forget when setting the size of these character fields to allow enough room for the decimal point and any possible sign characters.
The second aspect of XML-INTO that I'd like to discuss here facilitates the simplification of the target DS specification. It is in fact acceptable to ignore the root element in the document (i.e., Products in our example) and to simply load directly into a DS array corresponding to the next level (i.e., Category). Since this is the first repeating element, we can therefore code it as a DS array. This has an additional benefit in that the RPG compiler can now actually tell us how many elements were processed. Indeed, if the Category element were the only repeating element in the document (it isn't because Product also repeats), we would not even need to specify the "allowmissing" option. The revised sections of the example, including the Program Status Data Structure (PSDS) that contains the element count, are shown below.
// When the INTO target is an array, xmlElements will contain a count
// of the number of elements loaded
D xmlElements 20i 0 Overlay(progStatus: 372)
// Note that the products DS is no longer required
D category DS Qualified Dim(20)
D code 2a
D description 20a
D product LikeDS(product) Dim(50)
// Note modified XML-INTO now targets the category DS
%XML(XML_Source: 'case=any doc=file allowmissing=yes');
That's all I have time for in this article. Click here to download this source code.
In the next episode in this series, I will discuss how to handle situations in which RPG's current size limits "get in the way." And if we have time, we will also take a brief look at XML-INTO's little brother, XML-SAX.