String-Scanning Performance Comparisions

System Administration
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

Brief: This is the third in a series of articles (beginning in the September 1993 issue) on improving performance. This article discusses several methods of string scanning to help you determine the best way to search a database or source file.

Scanning the records in a file for a particular string of characters is a common task which we often need to perform. Fortunately, the AS/400 gives us several good methods to accomplish this task, but deciding which method to use is an important consideration. Depending on a number of factors, the method you choose may not perform as well as other alternatives. This article explores the various methods of searching files to help you make the right decisions about which method to use.

I conducted all the performance tests on a model D02-one of the slower AS/400 models. What is important is not the actual times, but the relative difference between performance times generated by the various approaches. In conducting these performance tests, I split them into two categories: scanning source files and scanning database files.

SCANNING SOURCE FILES

A programmer often needs to scan source to locate something he has already coded. The system does not provide a direct solution, but the programmer can choose from several indirect solutions. I ran a series of tests as shown in 1 which I've named methods A, B, C, D and E. Refer to this figure as I discuss each of the tests results. In all cases, a source file with 53 members containing a total of 6,897 statements was scanned. Nine members had a statement that matched the scan string.

A programmer often needs to scan source to locate something he has already coded. The system does not provide a direct solution, but the programmer can choose from several indirect solutions. I ran a series of tests as shown in Figure 1 which I've named methods A, B, C, D and E. Refer to this figure as I discuss each of the tests results. In all cases, a source file with 53 members containing a total of 6,897 statements was scanned. Nine members had a statement that matched the scan string.

Method A: FNDSTRPDM

PDM provides the Find String Using PDM (FNDSTRPDM) command, which I find awkward to use because it has so many options. When you prompt for the command, as shown in 2, it is not easy to remember which options to include. If you generate the following statement, the output of FNDSTRPDM shows one line per member.

PDM provides the Find String Using PDM (FNDSTRPDM) command, which I find awkward to use because it has so many options. When you prompt for the command, as shown in Figure 2, it is not easy to remember which options to include. If you generate the following statement, the output of FNDSTRPDM shows one line per member.

 FNDSTRPDM STRING(xxx) FILE(yyy) + MBR(*ALL) OPTION(*NONE) + PRTMBRLIST(*YES) 

The information describes the member name, but it does not describe how many hits (matching strings) were found; nor does it print the source statements which contain the hits. You would have to scan the source members identified to determine which statements interest you.

On the other hand, if you execute the following statement:

 FNDSTRPDM STRING(xxx) FILE(yyy) + MBR(*ALL) OPTIONS(*NONE) + PRTMBRLIST(*YES) PRTRCDS(*ALL) 

...you see the individual records and a count of how many hits were found; but you also get a lot of output that may be difficult to read. See 3 for the output generated by this FNDSTRPDM statement.

...you see the individual records and a count of how many hits were found; but you also get a lot of output that may be difficult to read. See Figure 3 for the output generated by this FNDSTRPDM statement.

One nice thing about FNDSTRPDM is that you can execute a PDM option when you find the source member that matches the string you're trying to locate. Personally, I do not find this very practical for my typical source-scanning needs. Since scanning on a sizable source file is slow, you would fare better by running the scan in batch, examining the spooled output and using SEU from that point. Even for small files, you may not always want to execute the same PDM option on each hit.

Method B: SCNSRC

Scan Source (SCNSRC), a TAA tool from the QUSRTOOL library, makes it much easier to specify a scan. The prompt shown in 4 illustrates that the structure of the SCNSRC statement can be as simple as this:

Scan Source (SCNSRC), a TAA tool from the QUSRTOOL library, makes it much easier to specify a scan. The prompt shown in Figure 4 illustrates that the structure of the SCNSRC statement can be as simple as this:

 SCNSRC FILE(xxx) ARGUMENT(yyy) 

SCNSRC prints a simple listing of the individual statements that contain the matching argument. You can often avoid editing the member just by seeing the statement. For example, if you search for a certain argument in your source files, you really want to see where you used that argument in a command or RPG statement-not in a comment. SCNSRC can't make that distinction, but you can when you see the spooled output. An example of the SCNSRC output is shown in 5 (page 112).

SCNSRC prints a simple listing of the individual statements that contain the matching argument. You can often avoid editing the member just by seeing the statement. For example, if you search for a certain argument in your source files, you really want to see where you used that argument in a command or RPG statement-not in a comment. SCNSRC can't make that distinction, but you can when you see the spooled output. An example of the SCNSRC output is shown in Figure 5 (page 112).

You can use the Scan All Source (SCNALLSRC) TAA tool to submit a batch job for one or more of the standard source files (e.g., QCLSRC). SCNALLSRC runs a separate SCNSRC command for each standard source file.

Method C: SCNSRC with SCAN Op Code

The SCNSRC TAA tool operates as explained, no matter what version of QUSRTOOL you're using. The primary difference between the old version (Method B) and the new version (Method C) lies in the technique used to perform the scan.

The old version of SCNSRC called the system program, QCLSCAN, for every statement read. The December 1993 version uses the RPG SCAN operation code to expedite the process. The scanning probably takes the same amount of time, but the RPG solution avoids calling a program for each scan. (As I showed in "Improving the Performance of Program Calls," MC, November 1993, calling a program repeatedly can require substantial system overhead.)

You can't get the new version of SCNSRC the way you would obtain most other updates, because QUSRTOOL is never changed via program temporary fixes (PTFs) or PTF cumulative packages. The source that is shipped remains unchanged throughout the release. Even if you have installed V2R3, you don't have the most recent version of QUSRTOOL-the changes in the December update were made after V2R3 was frozen. You receive fixes, enhancements and new tools added to QUSRTOOL through an informal update process. For information on obtaining and installing QUSRTOOL updates, consult "Tips for Managing the QUSRTOOL Library" (MC, October 1993) or have your system engineer send a request to QUSRTOOL at RCHASA04.

Method D: SCNSRCARC

Scan Source Archive (SCNSRCARC) is part of the Source Archive (SRCARC) TAA tool, which captures your source and stores the data from all the archived members in a single member. Before you can use SCNSRCARC, you must build an archive file with the Create Source Archive (CRTSRCARC) QUSRTOOL command. All the archived members are stored in a single member; a single data record in the archive contains multiple lines of your source and squeezes out the consecutive blanks. These two features of SRCARC files greatly increase the efficiency of SCNSRCARC.

Because there are no consecutive blanks, fewer characters need to be scanned. SCNSRCARC performs OPENs on a single member, rather than a multimember file, resulting in far fewer OPENs. Unfortunately, the command only prints the fact that the argument exists in a member (just like the first use of FNDSTRPDM). You have to look at the member with SEU or one of the Display commands and scan again for the statements that contain the matching string.

As with SCNSRC, the new version of SCNSRCARC updated in December 1993 uses the RPG SCAN op code instead of the QCLSCAN program. The SRCARC utility compacts multiple source statements into a single record, possibly splitting one of your source records over two archive records. As a result, the value you're trying to locate may also be split. The code for this utility accounts for this possibility by including the last 20 bytes from the previous record in the scan of each record.

Method E: OPNQRYF

The Open Query File (OPNQRYF) scan method does not support the MBR(*ALL) function on the Override Database File (OVRDBF) command. Consequently, to use OPNQRYF on source, you have to list the source members via an API, use the Retrieve Member Description (RTVMBRD) command or create an outfile with the Display File Description (DSPFD) command. OPNQRYF does not generate any output to help you determine if it found any records. You could call a program and perform a READ operation or use the Copy from Query File (CPYFRMQRYF) command.

To run a test using OPNQRYF, I used DSPFD to create an outfile and then OPNQRYF in a loop on each member. I used the CONTAINS function (*CT) to perform the OPNQRYF scan and called an RPG program after each OPNQRYF. The program executed a user-controlled OPEN followed by a READ. Any matching records were listed in a method similar to SCNSRC. Upon reaching end-of-file, the program closed the database member (with the CLOF op code) so that it could open the next member after OPNQRYF was used again. The printer file was kept open until all members had been read.

The Performance Results

The test results in 1 show that:

The test results in Figure 1 show that:

o SCNSRCARC (Method D) generates the fastest solution if you're scanning source archived with CRTSRCARC. This is because it performs a single OPEN and has fewer bytes to scan. It's twice as fast as SCNSRC but doesn't provide as much detail. The bigger your source files (in terms of both members and statements), the better SCNSRCARC will look in comparison to the other methods.

o OPNQRYF (Method E) provides a very good scan facility, but it creates a good deal of overhead before it gets going. It isn't very efficient on a member with a small number of records and it doesn't allow the MBR(*ALL) option of OVRDBF. Therefore, OPNQRYF is not a good choice on source files.

o The QUSRTOOL SCNSRC command in Method C performs better overall than FNDSTRPDM. The SCNSRC command is much easier to work with, the output is more detailed and the performance is just as good or slightly better.

o The SCNSRC command produces quicker results when it makes use of the RPG SCAN op code (Method C) instead of the QCLSCAN program in Method B.

SCANNING DATA FILES

In the test summarized in 1, OPNQRYF (Method E) did not work well because of the large number of members (each containing a small number of records) and because no access paths could be used to assist the performance of OPNQRYF. These conditions are typical of source files, but some opposite conditions tend to characterize data files. A data file usually consists of a single member which contains much more data than a source member. Also, an access path that can assist in the search may already exist.

In the test summarized in Figure 1, OPNQRYF (Method E) did not work well because of the large number of members (each containing a small number of records) and because no access paths could be used to assist the performance of OPNQRYF. These conditions are typical of source files, but some opposite conditions tend to characterize data files. A data file usually consists of a single member which contains much more data than a source member. Also, an access path that can assist in the search may already exist.

Scanning data files offers OPNQRYF a better chance to perform well. Of course, OPNQRYF can excel when you use an existing access path to help select the records. To analyze the effect that an open access path can have on OPNQRYF's performance, I ran two sets of tests-one set without the aid of an existing access path and another set that does utilize an open path.

Scanning When No Access Path Exists

In the first set of tests, I compared Method E (OPNQRYF) and Method C (the new version of SCNSRC that uses the RPG SCAN op code) against an 80-byte search argument. Normally, you cannot run SCNSRC against a database file; but for testing purposes, it was a simple way to measure the efficiency of the RPG SCAN op code. I ran several of these tests, varying the number of records. The file, which contains a number of 100-byte records, was read in arrival sequence and each test located and printed 1-2 percent of the records. No access path existed for OPNQRYF to use.

I did not try Method B (the QCLSCAN program used with the old version of SCNSRC). The results from the test on source files indicate that the RPG SCAN op code is the better SCNSRC choice.

Because the file is being read in arrival sequence, the RPG program defaults to use blocking. With a record size of 100 bytes, the default block size contains approximately 40 records. Arrival-sequence input processing is very efficient on the system when blocking is used. With arrival-sequence input processing, the high-level language can select records faster than, or as fast as, OPNQRYF.

The Performance Results

A few observations arise from the test results in 6.

A few observations arise from the test results in Figure 6.

o The RPG SCAN op code performs quite well when processing data files without the aid of an existing access path, regardless of the number of records being processed.

o OPNQRYF, in contrast, seems designed to operate on lots of volume and therefore doesn't make a lot of sense on a handful of records.

o Although effective in this situation, OPNQRYF's performance is generally equaled or bettered by the use of the RPG SCAN operation. This is particularly true under certain circumstances: when no existing access path can be used; when you use arrival-sequence, input-only processing; and when you are not dealing with large volumes.

o OPNQRYF's performance suffers because arrival-sequence input processing is very fast on the system (very little overhead). OPNQRYF can't offer much improvement when there is little overhead to be avoided.

Scanning When an Access Path Exists

My second set of tests with data files shows an example of effective OPNQRYF usage. In this case, the file has 100,000 records and I experimented with situations which varied whether or not there was an access path and the percentage of matches found for the search argument.

In each test, the program that processes the data is very simple (it counts only the hits). The third and fourth tests have an access path over the field being scanned. The first three tests all find 1,000 hits, or 1 percent of the file. The fourth test finds only 50 records, or .05 percent of the records in the file.

The Performance Results

7 contains a summary of the performance results, which lead to some conclusions:

Figure 7 contains a summary of the performance results, which lead to some conclusions:

o When no access path exists for OPNQRYF to use, it performs about as well as the RPG SCAN op code.

o When OPNQRYF does have an access path it can use, it outperforms the RPG SCAN op code-hands down. Even when an access path exists, OPNQRYF will use arrival sequence if too high a percentage of the file is read (approximately 20 percent or more).

o When a smaller percentage of records is selected by OPNQRYF, it performs even better.

CHOOSING THE BEST METHOD

As you have seen by reading this article, there are many options available to perform searches on source and database files. Numerous factors influence whether one method will perform better than another. You can determine the best method only through careful consideration of the factors involved.

Which technique you choose can make a big difference in terms of performance. Hopefully, by studying these test results, you will be able to make a more informed decision the next time you need to perform this task. In an upcoming article, we'll look at processing by key, sorting, and processing for update, where OPNQRYF file can also be a significant advantage.

Jim Sloan is president of Jim Sloan, Inc., a consulting company. Now a retired IBMer, Sloan was a software planner on the S/38 when it began as a piece of paper. He also worked on the planning and early releases of AS/400. In addition, Jim wrote the TAA tools that exist in QUSRTOOL. He has been a speaker at COMMON and the AS/400 Technical Conferences for many years.

REFERENCES "Improving the Performance of Program Calls," MC, November 1993.

"The Truth About RPG Performance Coding Techniques," MC, September 1993.


String-Scanning Performance Comparisions

Figure 1 Scanning Source

 CPU Job Method Description of Method Seconds Seconds A FNDSTRPDM 22.5 41 B Old version of SCNSRC (using QCLSCAN) 26.8 40 C New version of SCNSRC (using SCAN op code) 20.9 37 D New version of SCNSRCARC (using SCAN op code) 8.4 17 E DSPFD *MBRLIST and OPNQRYF 60.0 84 
String-Scanning Performance Comparisions

Figure 2 The FNDSTRPDM Prompt

 Find String Using PDM (FNDSTRPDM) Type choices, press Enter. Find 'string' . . . . . . . . . ___________________________________________ File . . . . . . . . . . . . . . ___________ Name Library . . . . . . . . . . . *LIBL *LIBL, *CURLIB, name Member . . . . . . . . . . . . . ___________ *ALL, name, *generic* + for more values ___________ Operation to perform: Option . . . . . . . . . . . . _______ Character value, *EDIT... Prompt . . . . . . . . . . . . *NOPROMPT *NOPROMPT, *PROMPT Additional Parameters Columns to search: From column . . . . . . . . . 1 1 - *RCDLEN To column . . . . . . . . . . *RCDLEN 1 - *RCDLEN Kind of match . . . . . . . . . *IGNORE *IGNORE, *MATCH Print list . . . . . . . . . . . *NO *NO, *YES Print records: Number to find . . . . . . . . *NONE *NONE, *ALL, number Print format . . . . . . . . . _______ *CHAR, *HEX, *ALTHEX Mark record . . . . . . . . . ______ *MARK, *NOMARK Record overflow . . . . . . . _________ *FOLD, *TRUNCATE Parameters . . . . . . . . . . . ___________________________________________ _______________________________________________________________________________ Bottom F3=Exit F4=Prompt F5=Refresh F12=Cancel F13=How to use this display F24=More keys 
String-Scanning Performance Comparisions

Figure 3 FNDSTRPDM Output

 5738PW1 V2R2M0 920925 Programming Development Manager 10/28/93 13:29:04 Page 1 File . . . . . . . . : QDDSSRC Library . . . . . . : SLOANT Find . . . . . . . . : PFILE From column . . . . . : 1 To column . . . . . . : *RCDLEN Kind of match . . . . : 2 1=Same case, 2=Ignore case Number to find . . . : *ALL Print format . . . . : *CHAR Mark record . . . . . : Y Y=Yes, N=No Record overflow . . . : 1 1=Fold, 2=Truncate _______________________________________________________________________________ _____________________________________________________ Member . . . . . . . : DSPOBJDL Creation date . . . . . . : 06/05/93 Type . . . . . . . . : LF Last changed date . . . . : 06/05/93 Text . . . . . . . . : DSPOBJD LF by object name Last changed time . . . . : 12:31:00 Record length . . . . : 92 Number of records . . . . : 2 SEQNBR *...+....1....+....2....+....3....+....4....+....5....+....6....+....7....+.... 8....+....9....+....100 Last Changed Date PFILE 100 A R QLIDOBJD PFILE(DSPOBJDP) 11/01/92 Number of records searched . . . . . . . . . . . : 2 Number of records to find . . . . . . . . . . . . : *ALL Number of records found . . . . . . . . . . . . . : 1 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ E N D O F M E M B E R _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Member . . . . . . . : FILEJL Creation date . . . . . . : 06/05/93 Type . . . . . . . . : LF Last changed date . . . . : 06/05/93 Text . . . . . . . . : Logical over FILEJ Last changed time . . . . : 12:31:11 Record length . . . . : 92 Number of records . . . . : 3 SEQNBR *...+....1....+....2....+....3....+....4....+....5....+....6....+....7....+.... 8....+....9....+....100 Last Changed Date PFILE 100 A R FILEJR PFILE(FILEJ) 07/31/92 Number of records searched . . . . . . . . . . . : 3 Number of records to find . . . . . . . . . . . . : *ALL Number of records found . . . . . . . . . . . . . : 1 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ E N D O F M E M B E R _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
String-Scanning Performance Comparisions

Figure 4 The SCNSRC Prompt

 Scan Source - TAA (SCNSRC) Type choices, press Enter. File name . . . . . . . . . . . Name, *CBL, *CL, *CLP... Library name . . . . . . . . . *LIBL Name, *LIBL Argument to scan for . . . . . . Member name . . . . . . . . . . *ALL Name, *ALL Wild character or blank . . . . Character value Bottom F3=Exit F4=Prompt F5=Refresh F12=Cancel F13=How to use this display F24=More keys 
String-Scanning Performance Comparisions

Figure 5 SCNSRC Output

 UNABLE TO REPRODUCE GRAPHICS 
String-Scanning Performance Comparisions

Figure 6 Scanning Data Without an Existing Access Path

 Number of CPU Job Records Method Used Seconds Seconds 100 C - (RPG SCAN op code) 1.8 4 100 E - (OPNQRYF) 3.7 6 2,000 C - (RPG SCAN op code) 3.3 5 2,000 E - (OPNQRYF) 4.4 7 10,000 C - (RPG SCAN op code) 12.7 14 10,000 E - (OPNQRYF) 13.1 16 50,000 C -(RPG SCAN op code) 59.8 65 50,000 E - (OPNQRYF) 59.4 68 
String-Scanning Performance Comparisions

Figure 7 Scanning Data with an Existing Access Path

 CPU Job Method Used Seconds Seconds RPG SCAN op code 93.3 106 OPNQRYF (no access path) 98.6 113 OPNQRYF (access path 1%) 10.2 31 OPNQRYF (access path .05%) 4.0 13 
BLOG COMMENTS POWERED BY DISQUS