RPG/400 and CL string handling has been enhanced-make the most of it.
Brief: String operations have changed dramatically since the AS/400 was first announced. We'll give you a concise review of the current state of string operations in RPG/400 and CL. Use these operations instead of array processing for string manipulations-it will make your programs much simpler to code and maintain.
When I played bass guitar many years ago, I virtually never changed the strings. After all, they were quite thick and never broke; and after they were stretched and broken in, I knew the spots on the strings that sounded flat and those that were resonant. Long after playing my last note, I find that I've retained my old string habits in my programming work. I have been working with RPG for as long as the bass has been retired to the closet, and now I find that I have resigned myself to using arrays to work with strings. But many recent enhancements to RPG/400 lead me to believe that it is time for some new thinking.
It seems that all of the complaints about the lack of string operations voiced by RPG programmers over the years have been answered all at once. The elementary needs of joining strings, extracting parts of strings, changing case of letters within strings and scanning and clearing strings are now operation codes in RPG. The introduction of these new operations in Version 1 of OS/400 met with criticism because of some basic oversights. The operations have been corrected in newer versions, and even more operations and options will be available in Version 2 Release 2.
This article reviews several of the string operations and processing available with Version 2 Release 1 RPG/400 and CL. If you are like me-still using array processing for basic string operations-you will find this very useful. When I began using some of the new operations, I constantly consulted the manual to check the syntax and options; but even so, the operations were much easier to use than the equivalent array processing.
Perhaps the most fundamental issue in using strings is defining their contents. The CL DCL statements in 1 declare character variables (*CHAR) and assign them initial values at the same time.
Perhaps the most fundamental issue in using strings is defining their contents. The CL DCL statements in Figure 1 declare character variables (*CHAR) and assign them initial values at the same time.
RPG presents several available methods to initialize strings. You can use data structure subfields or named constants, as I've done in 2. You can also code strings by using literals in an array or by using combinations of MOVEL and MOVE operations.
RPG presents several available methods to initialize strings. You can use data structure subfields or named constants, as I've done in Figure 2. You can also code strings by using literals in an array or by using combinations of MOVEL and MOVE operations.
There are two reasons for defining the RPG strings as data structure subfields rather than as named constants. First, the subfield can be used to create a string variable of a certain length. The length of a named constant is specified by the number of characters in the constant. Second, the value of the subfield can be changed during the program execution; named constants cannot be changed. Using the subfield and named constant techniques shown, you can assign string values up to 256 characters long. Chapter 9 of the RPG User's Guide (SC09-1348) includes several examples of how to define long strings.
There are several operations you can perform on strings. I'll define them in full first; then you can refer to 3 where you can see them in action all at once.
There are several operations you can perform on strings. I'll define them in full first; then you can refer to Figure 3 where you can see them in action all at once.
Clearing: Clearing a character string means filling the entire string with blanks. The old value of the string is lost. In CL you use the CHGVAR command, specifying a single blank space in the VALUE parameter. You cannot specify a null character (two single quotes with no embedded blank) for the value; the program will fail to compile.
In RPG, the simplest way to clear a field is with the CLEAR operation. This has the same effect as moving *BLANK or *BLANKS to the field. CLEAR also has the panache of being somewhat object-oriented, so that you can "overload" CLEAR and use it to initialize numeric or indicator fields.
Assigning: Assigning means giving a variable a specific non-blank value; the old value contained in the variable is lost. You can assign either a literal or another variable.
In CL, you use the CHGVAR command, supplying the new value in the VALUE parameter. If the variable is longer than the value being assigned, the remaining space is padded with blanks. If the variable is shorter than the value, CHGVAR fills the variable, beginning from the left-hand side. The rest of the string is ignored.
In RPG, the MOVEL and MOVE operations assign values to variables; the difference between them is that MOVEL begins the assignment from the left end of the string, while MOVE begins the assignment from the right.
If the string is shorter than the variable, the rest of the variable is not padded with blanks as it happens in CL-the extra characters are left untouched! To avoid this problem, you must use the CLEAR operation before MOVEL or MOVE to ensure that there won't be any garbage characters left over from the old value. One of the Version 2 Release 2 enhancements discussed in the sidebar will make the CLEAR operation unnecessary in this situation.
By the way, this is a good time to note, once and for all, that you should be in the habit of using the RPG MOVEL operation as opposed to the MOVE operation, unless you have a very specific reason for doing otherwise. The reason, as I have already explained, is that MOVE performs the assignment beginning from the right end of the variable. If both the variable and the new value are of the same length, you can get away with MOVE; but if the variable is longer than the value, you may not get what you expected. (It once took me several hours to track down a bug that was caused by using MOVE where MOVEL was intended.) It's unfortunate that the operation codes are named as they are; it would have been easier if the MOVE operation performed the MOVEL function, and if there was a MOVER (Move Right) operation instead.
Concatenation: Concatenation is the operation you use to splice two strings together, making up a longer string. For example, think of one variable having your first name (FIRST) and another variable that holds your last name (LAST). Concatenating FIRST with LAST creates a field with your full name.
CL allows the use of operators *CAT, *BCAT and *TCAT. *CAT simply joins strings as they are. *BCAT and *TCAT remove all trailing blanks from the first string and then join it with the second string. The difference between *BCAT and *TCAT is that *BCAT forces one blank space between the two before joining them. So, to concatenate FIRST and LAST, you should use *BCAT. None of the CL concatenate operations strip leading blanks from the strings.
The RPG CAT operation is more flexible than its CL counterpart, in that you can specify the number of intervening blanks for the resulting string. Factor 1 and Factor 2 contain two strings to be concatenated. The string in Factor 2 is concatenated to the end of the string specified in Factor 1. The result field specifies the resulting string. The number of intervening blanks is indicated by appending :n to Factor 2, where n indicates the number of blanks. If you omit :n, the two strings are concatenated without removing or adding any blanks, the same way that a CL *CAT works.
One of the recent enhancements to the RPG CAT operation is the operation extender field, which is column 53 of the C-spec. If you enter a "P" into that field, the result field is padded on the right with blanks after the operation. This allows you to combine a CLEAR operation with the CAT operation, rather than having to specifically clear the result field before performing the concatenation.
Substring: Another fundamental operation, supported in both CL and RPG, is the substring operation. Substring is used to extract part of a string. You specify the starting position for the extraction, the number of characters to extract and the variable where the extracted characters are placed.
CL performs this operation with the %SUBSTRING (or %SST) function. %SST requires three positional parameters: the name of the original string variable, the starting location to begin extracting and the number of characters to extract.
In RPG, you use the SUBST operation. Factor 2 must have the name of the original string variable; Factor 1 is the number of characters to extract; and :n appended to Factor 2 indicates where to begin extracting. If :n is omitted, a starting position of 1 is assumed.
In both CL and RPG, the starting position and the number of characters to extract can be numeric variables or literals. Be careful when you specify these values. If you specify a starting position or number of characters to extract that equals zero or goes beyond the end of the string, you'll get a program error. You can use the MONMSG command (in CL) or an indicator in columns 56-57 (in RPG) to monitor for that condition.
The CL substring is more flexible than its RPG counterpart in that CL can use a substring as a "pseudo variable." For example:
CHGVAR VAR(%SST(&STRING 3 1)) + VALUE('R')
We don't want to extract part of the string; rather, we want to change part of a string. Starting at position 3 for a length of 1 character, &STRING is changed, placing a letter "R" there. If &STRING originally had the value "SOLE," it now becomes "SORE."
RPG currently lacks this capability. To code an equivalent function, you would have to use the SUBST operation to break the string apart, then the CAT operation to rejoin the string pieces with the replacement values. Or you could use the tired, tried-and-true method of moving the string to an array of single characters and changing one of the array elements.
Scanning: Scanning is the process of examining a string, or portion thereof, to find a shorter string. For example, the string "concatenate" contains the string "cat." When you scan "concatenate" for the string "cat," the scan function returns the value 4, the position where "cat" begins.
In RPG, you use the SCAN operation. If you want to look for a NEEDLE in a HAYSTACK, you need to put the search argument (NEEDLE) in Factor 1 and the HAYSTACK in Factor 2. Optionally, Factor 1 can contain a :n extender, indicating how many characters of NEEDLE you want to look for; similarly, Factor 2 can contain another :n extender, indicating where in HAYSTACK to begin the search.
You can use indicators or an optional result field to collect the result of the SCAN. If the result field is a numeric array, an entry is added each time the search argument is found. For example, if you search for "ss" is Mississippi, two entries would appear in the resulting array. The first element contains 3 and the second element contains 6. This process continues as long as space is available in the array and occurrences of the search argument are found.
If the result field is not an array, it must be a numeric field, in which case only the first occurrence of the search argument is reported. In either situation, a result of zero means that the search element was not found. For example, if you search for 'NEEDLE' in 'HAYSTACK,' the result will be zero.
Two resulting indicators can be coded for the SCAN operation, although the result field method is preferable. The indicator in columns 56-57 turns on if there is an error in the SCAN operation-for instance, if the starting position for the search, coded as :n, is greater than the number of positions in the string being searched. The indicator in columns 58-59 is activated if at least one occurrence is found. An indicator in 58-59 is required if no result field is coded.
CL uses a different method for scanning: you CALL an IBM-supplied program called QCLSCAN. QCLSCAN doesn't have RPG's capability to find all occurrences or to scan for just a few characters of the search argument, but it offers something else in return: the ability to scan using wild cards, to ignore case (for letters A-Z) and to trim trailing blanks from the search argument before scanning. The parameters for QCLSCAN are shown in 4.
CL uses a different method for scanning: you CALL an IBM-supplied program called QCLSCAN. QCLSCAN doesn't have RPG's capability to find all occurrences or to scan for just a few characters of the search argument, but it offers something else in return: the ability to scan using wild cards, to ignore case (for letters A-Z) and to trim trailing blanks from the search argument before scanning. The parameters for QCLSCAN are shown in Figure 4.
QCLSCAN reports the result of the scan by returning a result value in a decimal variable. If NEEDLE was found, the result variable contains the starting position within HAYSTACK. If not found, it returns zero. If QCLSCAN ends in error, it returns a negative number. You can see the complete documentation for QCLSCAN in the CL Programmer's Guide (SC41-8077).
Translation: Translating is the process of replacing all occurrences of one character with another, within a string. For example, the string MIZZIZZIPPI can be translated so that all Z's are changed to S's-that is, if spelling is important to you.
RPG handles translation with the XLATE operation. Factor 1 must contain two strings separated with a colon (:), listing the "from" pattern and the "to" pattern. Factor 2 contains the string being translated, and the result field names the field that will contain the translated string. An indicator appears in columns 56-57 if the XLATE operation ends in error (e.g., if the starting location is beyond the end of the search string).
CL translates by calling the IBM-supplied program QDCXLATE which, like QCLSCAN, is described in full in the CL Programmer's Guide. CL's QDCXLATE does not have RPG's flexible ability to create translate patterns on the fly. Instead, it forces you to create a translation table in order to perform any kind of translation. Fortunately, IBM provides several tables with OS/400, such as QSYSTRNTBL to convert lowercase to uppercase, or QASCII to convert EBCDIC characters to ASCII characters. See 5 for QDCXLATE parameters and the translation tables provided by IBM.
CL translates by calling the IBM-supplied program QDCXLATE which, like QCLSCAN, is described in full in the CL Programmer's Guide. CL's QDCXLATE does not have RPG's flexible ability to create translate patterns on the fly. Instead, it forces you to create a translation table in order to perform any kind of translation. Fortunately, IBM provides several tables with OS/400, such as QSYSTRNTBL to convert lowercase to uppercase, or QASCII to convert EBCDIC characters to ASCII characters. See Figure 5 for QDCXLATE parameters and the translation tables provided by IBM.
Check: Finally, string checking is the process of determining if all the characters in the string belong to a certain set of characters. For instance, you can check that a given string (which is supposed to contain a hexadecimal notation like "12F6") only contains the valid hexadecimal nybbles 0 to 9 and A to F.
RPG performs this function with the CHECK operation. Factor 1 is a string containing all the characters that are considered valid. Factor 2 contains the string to be checked, with an optional :n extension if you want to start checking at a certain position. The result field can contain either a numeric field or a numeric array, which will have the position(s) of the invalid characters found in the string. If all characters in the string are valid, the result field is zero. If you use a numeric array, CHECK reports multiple occurrences of invalid characters. An optional indicator in columns 56-57 can be used to monitor for an invalid condition such as a starting position beyond the end of the string.
CL has no equivalent of the CHECK operation.
There are several special values in RPG that are called figurative constants. Figurative constants can be used to set an entire string to specific values with a minimum of coding. The values of these constants are already defined; you simply use them as required.
Four of the most confusing figurative constants are *BLANK, *BLANKS, *ZERO and *ZEROS. The confusion arises from uncertainty about the equivalence of *BLANK and *BLANKS, and *ZERO and *ZEROS. In fact, the values in each pair are interchangeable. Moving *BLANK or *BLANKS to a field yields the same result: the field is set to all blanks. Similarly, you can use the Z-ADD operation to set a numeric field to zero, using either *ZERO or *ZEROS. In addition, if you use a move operation to move *BLANK or *BLANKS to a numeric field, the field is set to zero. Rather than struggling with any of this, simply use the CLEAR operation and don't worry about the type of field or whether or not it really is set to blanks.
Two figurative constants that are useful when setting keys for database file operations are *LOVAL and *HIVAL. When used with character fields, *LOVAL is defined as a string of nulls (hexadecimal '00'), the lowest character in the collating sequence, and *HIVAL is defined as a string of hexadecimal 'FF', the highest character. When used with numeric fields, *LOVAL and *HIVAL set the field to all nines with a negative or positive sign. You can set a key field to the *LOVAL value prior to using the SETLL operation, or to the *HIVAL value prior to using the SETGT operation. You can follow those with the READ or READP operations to read from the very beginning or the very end of the file.
A somewhat odd figurative constant, *ALL is used in conjunction with a move operation to repeat a character string through the length of a field. Now that named constants are part of RPG, it is probably simpler and clearer to set a field with a named constant value.
Finally, there are two figurative constants that you can use to set or test indicator values. Those are *ON and *OFF. *ON is set to all ones, *OFF to all zeros. You can use these to reset an *IN indicator value, or as part of the test on an IF or DOUxx operation.
Tying it Together
This overview of character string functions should give you a good starting point for exploring the versatility of string operations by using the new capabilities of RPG and CL. 3 summarizes and illustrates with RPG and CL code how these capabilities can be put to work. You'll still need to refer to the manuals for the exact syntax if these operations are new to you, but I think you'll find it very worthwhile to study, become familiar with and start using them. In my projects, I have gone back into existing code which uses arrays to manipulate strings and replaced it with the equivalent string operation. This was fairly easy to do and I think it will make future maintenance much easier. Try it for yourself!
This overview of character string functions should give you a good starting point for exploring the versatility of string operations by using the new capabilities of RPG and CL. Figure 3 summarizes and illustrates with RPG and CL code how these capabilities can be put to work. You'll still need to refer to the manuals for the exact syntax if these operations are new to you, but I think you'll find it very worthwhile to study, become familiar with and start using them. In my projects, I have gone back into existing code which uses arrays to manipulate strings and replaced it with the equivalent string operation. This was fairly easy to do and I think it will make future maintenance much easier. Try it for yourself!
RPG String Enhancements to Come in V2R2
When OS/400 Version 2 Release 2 becomes available later this year, there will be several additions to the RPG string operations. Briefly stated, these include:
Hexadecimal literals and hexadecimal named constants. This will make it easier to define hexadecimal fields, rather than using the current BITOF/BITON method. This is mostly of use if you are adding display attribute characters to output fields, or if you are working with user- defined data streams or other device control programming.
The CHEKR (Check Reverse) operation. This will work similarly to the CHECK operation, the difference being that the checking proceeds from right to left. This can be used to find the rightmost character of a string that is not found in the set of valid characters. Documentation also suggests that this can be used to determine the length of a string. (So why not just make a LEN opcode, and make it that much easier?)
Optional Factor 1 for CAT and SUBST operations. For the CAT operation, the result field is used as the Factor 1 field if you leave Factor 1 blank. This is similar to the numeric operations (ADD, SUB) that allow an optional Factor 1. For the SUBST operation, Factor 1 is used to specify the number of characters to extract from the string. If you do not supply a Factor 1 value, the length used is the number of characters from the specified starting position within the string to the end of the string. This is useful if you need to extract a rightmost part of the string without calculating the length of the portion to extract.
Operation code extender for the MOVEL, MOVE and MOVEA operations. You can include the operation code extender (column 53) with those operations. If you specify a letter "P" in column 53, the result field is padded with blanks after the move is complete. This eliminates the need to clear the result field prior to performing the move.
String Handling in RPG/400 and CL
Figure 1 Defining the variables
Figure 1: Defining the Variables DCL VAR(&STRING1) TYPE(*CHAR) LEN(10) VALUE('A TEST') DCL VAR(&STRING2) TYPE(*CHAR) LEN(15) + VALUE('LONGER STRING')
String Handling in RPG/400 and CL
Figure 2 Initializing RPG stringsFigure 2: Initializing RPG Strings .... ....1.... ....2.... ....3.... ....4.... ....5.... ....6.... ....7 I DS I I 'A TEST' 1 10 STRNG1 I I 'LONGER STRING' 11 25 STRNG2 I 'RPG NAMED CONSTANT- C CONST I 'FIELD' .... ....1.... ....2.... ....3.... ....4.... ....5.... ....6.... ....7
String Handling in RPG/400 and CL
Figure 3 The details of string handlingFigure 3: The Details of String Handling OPERATION EXAMPLE OF CODE Clear CL: Fills a string with blanks CHGVAR VAR(&STRING2) VALUE(' ') from beginning to end. RPG: ..2....+....3....+....4....+....5....+ CLEARSTRNG2 EFFECT PRODUCED: (The following b's in Figure 3 represent a blank) Before: bbbbbbbbbb AbTESTbbbbbbbbb After: bbbbbbbbbb bbbbbbbbbbbbbbb Assignment CL: Gives one string the value of CHGVAR VAR(&STRING1) VALUE(&STRING2) another string. The old value of the receiving string is lost. RPG: ..2....+....3....+....4....+....5....+ CLEARSTRNG2 MOVELSTRNG1 STRNG2 Before: AbTESTbbbb LONGERbSTRINGbb After: AbTESTbbbb AbTESTbbbbbbbbb Concatenation CL: Joins two short strings together CHGVAR VAR(&STRING2) VALUE(&STRING1 + to create a longer string. *BCAT 'STRING') RPG: ..2....+....3....+....4....+....5....+ STRNG1 CAT 'STRING':1STRNG2 P Before: AbTESTbbbb After: AbTESTbbbb AbTESTbSTRINGbb Substring CL: Extracts part of a string and CHGVAR VAR(&STRING2) + places the extracted part into VALUE(%SST(&STRING1 3 4)) another string. RPG: ..2....+....3....+....4....+....5....+ 4 SUBSTSTRNG1:3 STRNG2 P Before: AbTESTbbbb AbTESTbSTRINGbb After: AbTESTbbbb TESTbbbbbbbbbbb Scan CL: Searches for the existence of a DCL VAR(&STRING1) TYPE(*CHAR) LEN(10) pattern within a string, return- DCL VAR(&STRLEN) TYPE(*DEC) LEN(3 0) + ing the position where found VALUE(10) or 0 if not found. DCL VAR(&STRPOS) TYPE(*DEC) LEN(3 0) + VALUE(1) CL provides translation and DCL VAR(&PATTERN) TYPE(*CHAR) LEN(4) + trimming before performing the VALUE('TEST') scan, in addition to wild cards. DCL VAR(&PATLEN) TYPE(*DEC) LEN(3 0) + VALUE(4) RPG supports multiple scans if DCL VAR(&XLATE) TYPE(*CHAR) LEN(1) + the Result field is a numeric VALUE('0') array. DCL VAR(&TRIM) TYPE(*CHAR) LEN(1) + VALUE('0') DCL VAR(&WILD) TYPE(*CHAR) LEN(1) + VALUE(' ') DCL VAR(&RESULT) TYPE(*DEC) LEN(3 0) CALL PGM(QCLSCAN) PARM(&STRING1 + &STRLEN &STRPOS &PATTERN + &PATLEN &XLATE &TRIM &WILD + &RESULT) RPG: ..2....+....3....+....4....+....5....+ 'TEST' SCAN STRNG1:1 POS AbTESTbbbb Result=3 Translation CL: Replaces all occurrences of one DCL VAR(&STRLEN) TYPE(*DEC) LEN(3 0) + (or more) characters into anoth- VALUE(10) er character, according to a DCL VAR(&TBL) TYPE(*CHAR) LEN(10) + translation table or pattern. VALUE('QSYSTRNTBL') DCL VAR(&TBLLIB) TYPE(*CHAR) LEN(10) + VALUE('QSYS') CHGVAR VAR(&STRING1) VALUE('Translate') CALL PGM(QDCXLATE) PARM(&STRLEN + &STRING1 &TBL &TBLLIB) Before: Translateb After: TRANSLATEb RPG: ..2....+....3....+....4....+....5....+ MOVEL'06/01/92'STRNG1 '/':'-' XLATESTRNG1 STRNG2 Before: 06/01/92bb After: 06/01/92bb 06-01-92bbbbbbb Check CL: Verifies that all characters in Not supported. a string belong to a given set defined as another string. RPG: ..2....+....3....+....4....+....5....+ 'ABCDEF' CHECKSTRNG1 POS ABCDE12Fbb Pos=6
String Handling in RPG/400 and CL
Figure 4 QCLSCAN program parametersFigure 4: QCLSCAN Program Parameters STRING - CHAR(1) - CHAR(999) Contains string to be translated STRLEN - PACK(3,0) Length of STRING STRPOS - PACK(3,0) Position in STRING at which to start scan PATTERN - CHAR(1) - CHAR(999) Pattern to scan for TRANSLATE - CHAR(1) If "1", translates lowercase to uppercase EBCDIC characters, "a" - "z" TRIM - CHAR(1) If "1", trim trailing blanks from PATTERN before scanning WILD - CHAR(1) Character in pattern in position to not be tested. If blank, all characters in pattern are used in the scan. The first character of the pattern cannot be a wild character. RESULT - PACK(3,0) Result field *GT 0 = position of first character of PATTERN in STRING *EQ 0 = PATTERN not found in STRING -1 = PATTERN longer than STRING -2 = PATTERN length less than 1 -3 = first character of PATTERN is a wild character -4 = PATTERN is all blank and TRIM is requested -5 = invalid STRPOS within STRING
String Handling in RPG/400 and CL
Figure 5 QDCXLATE program parametersFigure 5: QDCXLATE Program Parameters BUFLEN - PACK(5,0) Length of field to be translated RANGE(1 32767) BUFFER - CHAR(*) Character field to be translated Translation is returned in this field SBCSTBLN - CHAR(10) SBCS (Single Byte Character Set) Translation table to use SBCSTBLL - CHAR(10) Library for SBCSTBLN Note: SBCSTBLN can be a user defined translation table or one of the IBM supplied tables Table Description ---------------- ------------------------------------------------------- QSYS/QASCII EBCDIC => ASCII QSYS/QEBCDIC ASCII => EBCDIC QSYS/QSYSTRNTBL lower case => upper case (unaccented "a" - "z") QUSRSYS/QCASE256 lower case => upper case (extended characters, with accent marks, translated to unaccented, "a" - "z")