Best of Cozzi: Find and Replace

RPG
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

(Editor's note: Bob is on vacation this week, so he offers this "best of" article in case you missed it the first time.)

I've decided to give you a little gift. This issue includes a full-functioning procedure that performs a certain task and returns a value to the caller. The function that is performed is one that I've needed frequently in RPG, so I decided to write it myself rather than wait for the compiler to do it.

In my experience writing C++ applications for Windows and Linux, I've had the need for a Find/Replace function time and time again. In the world of OS/400 and RPG, I've needed one very frequently in some of the tools that I use. Sure, there are plenty of "Find" functions in OS/400, but there are few "Replace" functions-or more appropriately, few "Find and Replace" functions.

When the year 2000 was a future problem, one of my consulting clients was paying developers to sit at their desks and modify existing applications by changing the names of fields in over 1,500 source members. The developers had gone through nearly 78 programs over a three-week period when I was brought into the picture.

The client asked me to help with the conversion. Of course, dollar signs popped into my head, but I told him that he might be better off using a tool to do this kind of work. At the time, we knew of no such tool, so I gave him a quote to write it for him. He agreed, and I delivered a command that not only sought out the old field names, but also replaced the old names with the new names and was sensitive to RPG's columnar nature. Yes, the new field names were longer than the original field names.

Today, there is more of a need for a Find/Replace procedure than there is for a full-scale source scan and replace command. I've often needed to search for a pattern in a character field and replace it with something else; whether it is doubling up quotes, removing dashes, or simply changing portions of a street address. A Find/Replace procedure is very important in my day-to-day programming activities.

RPG IV already does a very good job of using the SCAN opcodes or %SCAN built-in function. It also does a good job of replacing data using either %SUBST or the %REPLACE built-in functions. So, to do a simple one-time Find/Replace, all an RPG IV programmer needs to do is something like this:

.....CSRn01..............OpCode(ex)Extended-factor2+++++++++++++++++++++++++++
0001 C                   eval      nPos =  %scan(szFindText : szData)
0002 C                   if        nPos > 0
0003 C                   eval      szData = %replace(szReplaceText : szData :
0004 C                                                nPos : nReplaceLen)
0005 C                   endif                             

The %scan built-in function on Line 1 searches the field named szData for the search pattern. The search pattern is stored in the field named szFindText. If %scan finds a match, the position within szData is returned and stored in the field named nPos. %SCAN is a case-sensitive scan; upper- and lowercase letters are taken into account so that, for example, an uppercase M does not match a lowercase m.

If a match is detected, nPos will be greater than zero (Line 2). Then, on Line 3, the %Replace built-in function is called to replace the original text with the new text. The field named szReplaceText (parameter 1) contains the replacement text--that is, it contains the text that will be inserted into the szData field (parameter 2). The szReplaceText is inserted into szData beginning in the positions identified by the nPos field (parameter 3). The fourth parameter of the %Replace built-in function (nReplaceLen) can be confusing. It identifies the number of characters in szData that will be deleted from the szData field--starting with the position specified in the nPos field--before the szReplaceText is inserted. If this parameter is not specified, then the length of the szReplaceText parameter is used.

Caveat Emptor: %Scan is case-sensitive only. You'd have to write your own "ignore case" procedures and convert the database before the scan is performed if you need that kind of functionality.

The fourth parameter of %Replace is optional, and, if unspecified, it defaults to the field length of the first parameter. This is important because the field length is not always the length you want. Often the desired length is the length of the text in the field. So something like
%Len(%TrimR(szReplaceText)) may be required.

You can delete text from the target field (szData in the example) by using a variable-length field for the szReplaceText, setting its length to zero, and then specifying a replacement length.

In the following example, a field is searched for a string of text, and then that text is deleted from the original field.

   0001 C                   eval      nPos =  %scan(ToLower(%TrimR(szFindText))
0002 
C                                            : ToLower(szData) )
0003 
C                   if        nPos > 0
0004 
C                   eval      szReplaceText = ''
0005 
C                   eval      nReplaceLen = %Len(%TrimR(szFindText))
     
 *                     // Delete some text from the szData field
0007 
C                   eval      szData = %replace(''  : szData :
0008
 C                                                nPos : nReplaceLen)
0009 
C                   endif                             

By inserting an empty string--via double apostrophes ('')--into the szData field at the position where the text was found, no new text replaces the old, and the original text is deleted.

A Better Way to Search and Destroy

I used variations of this type of technique many times and decided it was time to write a nicely packaged Find and Replace procedure. Of course, I would prefer a %FindReplace built-in function in the RPG IV language, but that is still years away. Listed in Figure 1 is the RPG IV source code for a module named FINDREPL that contains several procedures. The primary procedure is named FindReplace, and along with it are CvtCase (convert between upper/lowercase), ToLower (convert to lowercase), and ToUpper (convert to uppercase).

There are also three include members (aka /COPY's) that are used by the source listed in Figure 1. All three of these source members should be stored in the QCPYSRC source file. The source member FindRepl contains the procedure prototype for the FindReplace procedure, the source member APIErrEx contains two data structure formats used by OS/400 APIs, and the source member GetOpDesc contains the prototype for the ILE CEEDOD procedure. These source members are listed in Figures 2, 3, and 4 respectively.

Find/Replace Parameters

The FindReplace procedure accepts five parameters, but only the first three are required. The following is a brief description of each parameter. See the source code for more details.

Parameter
Usage
Description
szFind
Input
The Find Text-This is the text that will be searched for in the field specified for the third parameter. This field may be up to 2048 positions in length. It is a varying-length field and CONST, so fields that are fixed or varying-length may be specified with lengths up to 2048 positions.
szReplace
Input
The Replacement Text-his is the text that will replaced the "Find text" (parameter 1) in the field specified for the third parameter. This field may be up to 2048 positions in length. It is a varying-length field and CONST, so fields that are fixed or varying-length may be specified with lengths of up to 2048 positions. If this parameter is a varying-length field and it is empty (i.e., its length is equal to zero), then the find text is deleted from the search field (third parameter).
szData
Input
The Search Field-This parameter identifies the text that will be searched for the value specified in the find text (first parameter).
bOptions
Optional
Input
Options-This parameter controls the search-and-replace action. Currently, only two options are supported.
FR_MATCHCASE-Performs a case-sensitive find/replace. Unless this is specified, a case-insensitive find is performed.
FR_FINDFIRST-Locate and replace the first occurrence of the Find text. By default, all occurrences are located and replaced.
This parameter uses bit patterns to detect the options settings. There are compiler directives in the code that control whether it is being compiled under OS/400 V5R2 (where bit-wise operators are built-in). If not under V5R2 or later, then "grunt" methods are used to check the bit mask.
nCount
Optional
Output
Replace Count-This field, if specified, is set to the number of occurrences of the find/replace text. It must be a 4-byte integer field. These types of fields are normally defined as 10i0 fields.

The Find/Replace procedure source compiles into a module object. The module may be bound into your programs by copy (as the module itself) or by reference (stored in a service program). To store it in a service program, use the CRTSRVPGM command after the CRTRPGMOD command.

Compiling the Find/Replace Procedure

To compile the source, just use the regular option 15 in PDM or set up your compiler setting in CodeStudio to run the following command:

CRTRPGMOD MODULE(&C/&M) SRCFILE(&C/&F) SRCMBR(&M) DBGVIEW(*SOURCE)

To create a service program with just this module and procedure in it, run the following command:

CRTSRVPGM SRVPGM(TOOLS/FINDREPL) MODULE(TOOLS/FINDREPL) EXPORT(*ALL)

This command assumes TOOLS is the name of the library where you've stored the module object.

Summary

The Find/Replace procedure is only as fast as the corresponding %SCAN and %REPLACE built-in functions. So it should perform about as fast as anything you can write on your own.

So use it and enjoy it!

0001 H NOMAIN BNDDIR('QC2LE'COPYRIGHT('(c) 2002 by Robert Cozzi, Jr.')
0002  **  (c) 2002 by Robert Cozzi, Jr. All rights reserved.
0003  **  This is part of the "RPG ToolKit" for OS/400 Version 2.
0004  **  Delete or comment out the following /DEFINE statement  
0005  **  when compiling into production.
0006  /DEFINE   DEBUG
      
      
0007  **  This defines the FR_MODULE symbol. It is used by /IF statements.
0008  **  It allows them to avoid included the imported fields. Since they
0009  **  are exported by this module, they cannot also be imported.
0010  /DEFINE   FR_MODULE
      
0011  /COPY QCPYSRC,findrepl
0012  /COPY QCPYSRC,getopdesc
0013  /COPY QCPYSRC,apierrex

0014  ** EXPORTED GLOBAL VARIABLES
0015 D FR_OVERFLOW     S               N   EXPORT INZ(*OFF)
0016 D FR_RESULTLEN    S             10I 0 EXPORT INZ(0)
0017 D FR_REPLACECNT   S             10I 0 EXPORT INZ(0)

0018  **  PRIVATE global variables is used by 
0019  **  these procedures.
0020 D m_szCvtData     S          32000A   
0021 D m_szData        S          32000A   Varying

0022  **  Prototype for the Convert Case procedure
0023 D CvtCase         PR         32000A   Varying
0024 D  InString                  32000A   Const Varying
0025 D  nOption                      10I 0 Const OPTIONS(*NOPASS)

0026  **  Constants used to control conversion to 
0027  **  upper or lower case 
0028 D CC_TOUPPER      C                   Const(0)
0029 D CC_TOLOWER      C                   Const(1)

0030  **  Control structure used  by CvtData for 
0031  **  the call to QglConvertCase.
0032 D FRCB            DS                  
0033 D  ReqType                      10I 0 Inz(1)
0034 D  CCSID                        10I 0 Inz(0)
0035 D  CvtTo                        10I 0 Inz(0)
0036 D  Reserved                     10A   Inz(*ALLX'00')
     
0037  **  Prototype for the QlgConvertCase procedure
0038 D QlgCvtCase      PR                  ExtProc('QlgConvertCase')
0039 D  ctrlBlock                          Like(FRCB)
0040 D  inString                  32000A   Const
0041 D  OutString                 32000A
0042 D  nLength                      10I 0 Const
0043 D  APIErrorDS                         Like(API_ErrEx)


0044 *********************************************************
0045 **  F I N D  R E P L A C E  -  Find and Replace 
0046 *********************************************************
0047 **      Parameters:
0048 **             szFind - (Input) Text to search for
0049 **             szReplace - (Input) Text to use replace
0050 **                         (Can be empty) %Len(szReplace) = 0
0051 **             szData - (I/O) Searched/Replace Field
0052 **             bOptions - (Input) Bit flags indicating the
0053 **                        type of search/replace to perform.
0054 **                        FR_MATCHCASE = Case-sensitive scan
0055 **                        FR_WORD - match whole words only
0056 **                                  (not implemented yet)
0057 **             nCount - (I/O optional) An optional field
0058 **                        that will receive the number of
0059 **                        find/replace operations performed.
0060 **                        
0061 **             Also sets  m_nFindReplaceLength
0062 *********************************************************
0063 **      Return Value: Number of instances replaced
0064 *********************************************************

0065 P FindReplace     B                   Export
0066 D FindReplace     PI         32000A   Varying
0067 D  szFind                     2048A   Value Varying
0068 D  szReplace                  2048A   Const Varying
0069 D  szData                    32000A   Const Varying
0070 D  bOptions                      1A   Value OPTIONS(*NOPASS)
0071 D  nCount                       10I 0 OPTIONS(*NOPASS)
    
     
0072 *********************************************************
0073 **  Parameter Number Identifiers
0074 *********************************************************
0075 D p#_Options      C                   Const(4)
0076 D p#_Count        C                   Const(5)
0077 D p#_Data         C                   Const(3)

0078 *********************************************************
0079 **  Local variables
0080 *********************************************************
0081 D szScanData      S                   Like(szData) 
0082 D nFLen           S             10I 0
0083 D nReplaceLen     S             10I 0
0084 D nLen            S             10I 0
0085 D nPos            S             10I 0
0086 D nStart          S             10I 0
0087 D nFindCount      S             10I 0
0088 D bMatchCase      S              1N   Inz(*OFF)
0089 D bMatchFirst     S              1N   Inz(*OFF)

0090  /IF DEFINED(DEBUG)
0091  ** These are debug variables and are not used by the procedure.
0092  ** To remove them, comment out the /DEFINE DEBUG statement
0093  ** at the beginning of the source member.
0094 D db_Parms        S             10I 0
0095 D db_SearchLen    S             10I 0
0096 D db_FindLen      S             10I 0
0097  /ENDIF

0098  /IF DEFINED(DEBUG)
0099 C                   Eval      db_Parms = %parms
0100 C                   Eval      db_FindLen = %Len(szFind)
0101 C                   Eval      db_SearchLen = %Len(szData)
0102  /ENDIF

0103 C                   Eval      FR_REPLACECNT = 0
0104 C                   Eval      FR_RESULTLEN  = 0
0105 C                   Eval      FR_OVERFLOW   = *OFF

0106  **  Retrieve the length of the data to be scanned.
0107 C                   Eval      nLen = %Len(szData)

0108  **  The length of the Find pattern (data to search for)
0109  **  is saved and used as the Replacement-string length on
0110  **  the %Replace built-in function.
0111 C                   Eval      nFLen = %Len(szFind)
0112 C                   Eval      nReplaceLen = %Len(szReplace)

0113 C                   If        %Parms >= p#_Options
0114  /IF DEFINED(*V5R2M0)
0115  **  If we're compiling under OS/400 V5R2 or later, then
0116  **  use the %BITAND built-in function to check for control flags.
0117 C                   if        %BitAnd(bOptions : FR_MATCHCASE) =
0118 C                                 FR_MATCHCASE
0119 C                   Eval      bMatchCase = *ON
0120 C                   endif
0121 C                   if        %BitAnd(bOptions : FR_FINDFIRST) =
0122 C                                 FR_FINDFIRST
0123 C                   Eval      bMatchFirst = *ON
0124 C                   endif
0125  /ELSE
0126  **  If we're compiling under OS/400 V5R1 or earlier
0127  **  use the TESTB opcode to check for control flags.
0128 C                   TestB     FR_MATCHCASE  bOptions                 88
0129 C                   Eval      bMatchCase = *IN88
0130 C                   TestB     FR_FINDFIRST  bOptions                 88
0131 C                   Eval      bMatchFirst = *IN88
0132  /ENDIF
0133 C                   endif

0134  **  Copy the data to a global variable so that it can be modified
0135  **  and then returned to the caller.
0136 C                   Eval      m_szData = szData

0137 C                   if        NOT bMatchCase
0138  **  If ignoring case, then convert the scan pattern to lower case 
0139  **  and the scanned data to lower case. Note only a copy of the
0140  **  scanned data is converted.
0141 C                   eval      szFind = CvtCase(szFind : CC_TOLOWER)
0142 C                   eval      szScanData = CvtCase(m_szData : CC_TOLOWER)
0143 C                   else
0144  **  If not converting, then just extract the scanned data into 
0145  **  the local variable for use by the %scan built-in.
0146 C                   eval      szScanData = m_szData
0147 C                   endif

0148  **  Use a local variable to contain the scanned data.
0149  **  That way if ignore case is specified, the original data
0150  **  is not converted.
0151 C                   eval      nPos =  %scan(szFind : szScanData)
0152 C                   dow       nPos > 0
0153  **  Return the number of times the replacement was made.
0154 C                   Eval      nFindCount = nFindCount + 1
                        
0155  **  Replace the data in the scanned string.
0156  **  Since we can't touch storage outside the scanned field
0157  **  we use %SUBST to only touch the storage we own.
0158 C                   eval      m_szData = %replace(szReplace : m_szData :
0159 C                                                nPos : nFLen)
0160  **  Calculate the next scan's starting position.
0161  **  If this NOT done, a scan/replace the does not change
0162  **  the data could result in a never ending loop.
0163 C                   eval      nStart = nPos + nReplaceLen

0164  **  If we're only matching the first occurance, then we are finished.
0165 C                   if        bMatchFirst
0166 C                   Leave
0167 C                   endif
0168  **  We're done with the replacement. If ignore case is specified
0169  **  then we have to convert the scanned data to lower case and
0170  **  extract it into the scanned field for the next pass.
0171 C                   if        NOT bMatchCase
0172 C                   eval      szScanData = CvtCase(m_szData : CC_TOLOWER)
0173 C                   else
0174  **  If NOT converting to lower case, just copy the data for
0175  **  the next pass through the loop.
0176 C                   eval      szScanData = m_szData 
0177 C                   endif

0178  **  Now rescan the data and if there are still instances,
0179  **  loop back up and do another pass through the replace routine.
0180 C                   eval      nPos =  %scan(szFind : szScanData : nStart)
0181 C                   enddo

0182  **  We return the number of replacements made. Note Zero
0183  **  indicates that no replacements were performed.
0184  **  i.e., the scan pattern was not found.
0185 C                   if        %Parms >= p#_Count  
0186 C                   eval      nCount = nFindCount
0187 C                   endif
0188 C                   Eval      FR_RESULTLEN = %Len(m_szData)
0189 C                   Eval      FR_REPLACECNT = nFindCount
0190 C                   return    m_szData 
0191 PFindReplace      E

0192 *********************************************************
0193 **  C V T C A S E  - Convert between lower/upper case
0194 **      Parameters:
0195 **                  InString - (Input) Text to convert
0196 **                  nOption  - (Input) Control Option
0197 **                                     0 = To Upper
0198 **                                     1 = To Lower
0199 **      Return Value: Converted input text.
0200 **      (c) Copyright 2002 by Robert Cozzi, Jr.
0201 **          All rights reserved.
0202 *********************************************************
0203 P CvtCase         B                   Export
0204 D CvtCase         PI         32000A   Varying
0205 D  InString                  32000A   Const Varying
0206 D  nOption                      10I 0 Const OPTIONS(*NOPASS)
0207 D nLen            S             10I 0

0208 C                   Eval      nLen = %Len(InString)
0209 C                   if        nLen <= 0
0210 C                   return    ''
0211 C                   endif

0212  **  Clear the work variable used to return the upper or lower case
0213  **  value to the caller. Note szCvtData is a global variable.
0214 C                   Eval      %Subst(m_szCvtData:1:nLen) = *BLANKS

0215  **  QlgCvtCase uses 0 to convert to upper case, and 1 to convert 
0216  **  to lower case. We use the named constants CC_TOUPPER and CC_TOLOWER
0217  **  which are passed into this procedure on the nOption parameter.
0218  **  The parameter value is copied to the FRCB.CvtTo subfield.
0219 C                   if        %Parms >= 2
0220 C                   Eval      CvtTo = nOption
0221 C                   else
0222 C                   reset                   CvtTo
0223 C                   endif
0224 C                   CallP     QlgCvtCase(FRCB : InString : m_szCvtData :
0225 C                                        nLen : API_ErrEx)
0226  **  Use %subst to return only the converted data to the caller.
0227  **  Since the return value is VARYING, this causes the return values
0228  **  length to be set to the value of the nLen field.
0229 C                   return    %Subst(m_szCvtData : 1 : nLen )
0230 PCvtCase          E

0231 *********************************************************
0232 **  T O L O W E R - Convert to lower case
0233 **      Parameters:
0234 **                  InString - (Input) Text to convert
0235 **      Return Value: Converted input text.
0236 *********************************************************
0237 P ToLower         B                   Export
0238 D ToLower         PI         32000A   Varying
0239 D  InString                  32000A   Const Varying
0240  ** This is just a simplified wrapper procedure that 
0241  ** calls CvtCase() to convert to lower case.
0242 C                   Return    CvtCase(inString : CC_TOLOWER)
0243 P ToLower         E

0244 *********************************************************
0245 **  T O U P P E R - Convert to upper case
0246 **      Parameters:
0247 **                  InString - (Input) Text to convert
0248 **      Return Value: Converted input text.
0249 *********************************************************
0250 P ToUpper         B                   Export
0251 D ToUpper         PI         32000A   Varying
0252 D  InString                  32000A   Const Varying
0253  ** This is just a simplified wrapper procedure that 
0254  ** calls CvtCase() to convert to upper case.
0255 C                   Return    CvtCase(inString : CC_TOUPPER)
0256 P ToUpper         E

Figure 1: FindReplace and supporting procedures' source

      /IF NOT DEFINED(FINDREPLACE)
      /DEFINE FINDREPLACE

      /IF NOT DEFINED(FR_MODULE)
      ** IMPORTED FindReplace Variables
     D FR_OVERFLOW     S               N   IMPORT
     D FR_RESULTLEN    S             10I 0 IMPORT
     D FR_REPLACECNT   S             10I 0 IMPORT
      /ENDIF

     *********************************************************
     **  F I N D R E P L A C E - Find and Replace text
     *********************************************************
     **      Parameters:
     **             szFind - (Input) Text to search for
     **             szReplace - (Input) Text to use replace
     **                         (Can be empty) %Len(szReplace) = 0
     **             szData - (I/O) Search/Replace Field
     **             nLen - (Input, optional) Length of the szData
     **                     (third) parameter. If unspecified
     **                     the length if retrieved.
     **             bOptions - (Input) Bit flags indicating the
     **                        type of search/replace to perform.
     **                        FR_MATCHCASE = Case-sensative scan
     **                        FR_WORD - match whole words only
     **                                  (not implemented yet)
     **                        FR_FF   - Find first occurrance only.
     *********************************************************
     **      Return Value: Number of instances replaced
     *********************************************************
     D FindReplace     PR         32000A   Varying
     D szFind                      2048A   Value Varying
     D szReplace                   2048A   Const Varying
     D szData                     32000A   Const Varying 
     D bOptions                       1A   Value OPTIONS(*NOPASS)
     D nCount                        10I 0 OPTIONS(*NOPASS)

      **  Find the first occurrance only. 
     D FR_FINDFIRST    C                   Const(X'01')
     D FR_FF           C                   Const(X'01')
      **  Find only matching case characters.
     D FR_MATCHCASE    C                   Const(X'02')
     D FR_MC           C                   Const(X'02')

      *************************************************
      **  T O U P P E R 
      **  Convert lower to upper case
      *************************************************
     D ToUpper         PR         32000A   Varying
     D  InString                  32000A   Const Varying

      *************************************************
      **  T O U P P E R 
      **  Convert upper to lower case
      *************************************************
     D ToLower         PR         32000A   Varying
     D  InString                  32000A   Const Varying

      /ENDIF

Figure 2: FindReplace, ToUpper, ToLower, and CvtCase Prototypes


      /IF NOT DEFINED(API_ERROR_EX)
      /DEFINE API_ERROR_EX
     D API_ErrEx       DS                  Inz
     D  errKey                       10I 0
     D  errDSLenEx                   10I 0
     D  errDSRtnLenEx                10I 0
     D  errDSMsgIDEx                  7A
     D  errReservedEx                 1A   Inz(X'00')
     D  errCCSID                     10I 0
     D  errOffExc                    10I 0
     D  errExcLen                    10I 0
     D  errExcData                   64A

      /IF NOT DEFINED(API_ERROR)
      /DEFINE API_ERROR
     D API_Error       DS                  Inz
     D  errDSLen                     10I 0
     D  errDSRtnLen                  10I 0
     D  errDSMsgID                    7A
     D  errReserved                   1A   Inz(X'00')
      /ENDIF
      /ENDIF

Figure 3: API Error Data Structure

      /IF NOT DEFINED(CEEDOD)
      /DEFINE CEEDOD
     ** Prototype to call CEEDOD to get the length of parameters
     ** that are passed with OPTIONS(*VARSIZE)
      ** for example:  CallP     GetOpDesc(1 : q_dInfo:q_dType:q_dInf1:q_dInf2
     C*                                 : nParmLen : *OMIT)
     D GetOpDesc       PR                  extproc('CEEDOD')
     D  ParmNum                      10I 0 const
     D  o_descinf                    10I 0
     D  o_datatype                   10I 0
     D  o_descinf1                   10I 0
     D  o_descinf2                   10I 0
     D  ParmLen                      10I 0
     D  o_errcode                    12A   OPTIONS(*OMIT)

     D  q_dInfo        S             10I 0
     D  q_dType        S             10I 0
     D  q_dInf1        S             10I 0
     D  q_dInf2        S             10I 0
      /ENDIF

Figure 4: Get Operational Descriptor (CEEDOD) Prototype

Bob Cozzi is author of the best-selling The Modern RPG IV Language, Fourth Edition as well as RPG TNT: 101 Dynamite Tips 'n Techniques with RPG IV and is host of the i5 Podcast Network, which provides free video and audio podcasts to the i5 community. You can also see him in person at RPG World in May 2007.

BLOG COMMENTS POWERED BY DISQUS