Improving the Performance of Program Calls

System Administration
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

Brief: This is the second in a series of articles (beginning in the September 1993 issue) on improving AS/400 performance. In this installment, we evaluate performance comparisons for various types of program calls. As you will see, some perform better than others. This article will help you make the right decisions about which methods to use in your applications.

Calling subprograms to perform specific tasks in an application is a very good coding practice. It makes applications easier to maintain by promoting modular, reusable code. There are, however, performance costs associated with using this application design. The cost to call a program is not the same in all cases.

In this article, I present some performance tests to distinguish the quickest methods to use. These tests were conducted on a Model D02 running V2R2M0. If you have a faster CPU, you'll see a lot less time. What is important here is not the absolute values, but the amount of difference. I varied the type of program that did the call and the type of program called. I'll talk about the cost of calling programs under V2R2M0, and offer some comments on the effect that V2R3's Integrated Language Environment (ILE) will have on performance.

None of the methods I experimented with can make much difference if you call a program once during your application. As a result, this article focuses on cases in which you call the same program repeatedly. It is this situation that presents you with the opportunity to make significant performance improvements.

Calling from CL to CL

I ran the series of tests shown in 1 to determine how much overhead is involved in calling a CL program from another CL program. Read 1 now and follow it (and subsequent figures, when referenced) closely to grasp the comparisons I make throughout the article. This first figure is the most complex, so I've named the tests here to reduce the amount of cross-referencing you need to do.

I ran the series of tests shown in Figure 1 to determine how much overhead is involved in calling a CL program from another CL program. Read Figure 1 now and follow it (and subsequent figures, when referenced) closely to grasp the comparisons I make throughout the article. This first figure is the most complex, so I've named the tests here to reduce the amount of cross-referencing you need to do.

Case 1 = Base case with CHGVAR Case 2 = 1,000 calls to do CHGVAR Case 3 = Base case with SNDMSG Case 4 = 1,000 calls to do SNDMSG Case 5 = Case 4 with QCMDEXC Case 6 = Case 5 with CL logging

According to my tests, the difference between cases 1 and 2 is roughly 15 seconds. Invoking a CL program creates a considerable amount of system overhead. A CL program does not have the capability to remain active as an RPG program does by returning with the last record indicator (LR) off. Every time you call a CL program, it has to get going again and the cost is not trivial.

The difference between cases 1 and 2 is nearly the same as the difference between cases 3 and 4, indicating that a call from CL to CL takes about .015 seconds. Since the difference between the job time and the CPU time is approximately the same in the first two cases, there is little disk overhead if the program is already in main memory.

In case 3, adding the SNDMSG command caused an additional 49 seconds of CPU time over case 1. Therefore, the SNDMSG command's .049 seconds of overhead per use is much more expensive than the CHGVAR command used in case 1. Depending on what command you use, the cost to execute varies significantly.

You'll also notice a marked difference between the job time and the CPU seconds when the SNDMSG command is executed. SNDMSG causes some disk I/O to occur, which is significant. The job time for cases 4-6 exceeds the CPU time by roughly 16 seconds, or about .016 seconds per SNDMSG. So, depending on what command you are executing, you can expect a big difference in both CPU and job time.

The difference between cases 4 and 5 may surprise you. The general-purpose QCMDEXC program analyzes the command, extracts the parameters and invokes the same command processing program (CPP) that was executed for SNDMSG in case 4. Therefore, executing through QCMDEXC is more costly than calling a subprogram with the command already coded.

Those people who tend to log everything should not assume they can do so with minimal performance impact. Case 6 is the same as case 3 except that LOGCLPGM(*YES) is specified on the Submit Job (SBMJOB) command. The commands that are logged are turned into messages and sent to the job message queue. At the end of the job, you can choose to convert the job message queue into a job log. In this test, no job log was spooled. The extra time the job took was only the time to write the messages to the job message queue. Logging 1,000 commands in this manner ate up an additional 13 seconds of CPU time.

Conclusions:

o The cost to invoke a CL program from CL is not trivial. In many cases, a CL program executes only once and then your high-level language (HLL) program executes for a long time. If this is your case, you can't improve the overall job time much by changing the way you call the CL program or your HLL program.

o Commands that deal with external objects (e.g., message queues) incur both CPU time and disk I/O time. It is nontrivial to execute a lot of them.

o If you have to repeatedly invoke a CL function from your HLL program, you are better off calling a CL program rather than using QCMDEXC.

o Logging of CL programs is not free. Be sure you have a good reason to use LOGCLPGM(*YES).

Calling from CL to RPG

Because RPG can return control to the calling program with LR off (meaning the RPG program is still active), there are differences in calling an RPG program as opposed to a CL program. I ran the tests shown in 2 to analyze the overhead associated with calling an RPG program from a CL program.

Because RPG can return control to the calling program with LR off (meaning the RPG program is still active), there are differences in calling an RPG program as opposed to a CL program. I ran the tests shown in Figure 2 to analyze the overhead associated with calling an RPG program from a CL program.

The difference between the first two cases in 2 is roughly 35 seconds or a cost of .035 to call an RPG program which sets LR on before the return.

The difference between the first two cases in Figure 2 is roughly 35 seconds or a cost of .035 to call an RPG program which sets LR on before the return.

Case 3 shows the benefit of returning with LR off. If you are going to use the program repetitively, the overhead is cut in half if you keep the program active by returning with LR off. This RPG program did not open any files and it is a very simple program. If you open files and have a large program (lots of fields, for instance), the time required to initialize the program becomes even greater.

Conclusion: The RPG capability to return with LR off is very significant. At any place in your application where you repetitively use the same RPG program, the advantage of having the program already active makes a positive impact on performance.

Unqualified vs. Qualified Program Calls

When you call a program, most users specify an unqualified call so that the library defaults to *LIBL. For example, the following are the same:

CALL PGM(PGMX) CALL PGM(*LIBL/PGMX)

When an unqualified call is made, the system has to search the library list for the program.

The tests shown in 3, all of which execute calls from CL to RPG, illustrate the effect of calls to different locations on the library list as compared to the effect of a qualified call.

The tests shown in Figure 3, all of which execute calls from CL to RPG, illustrate the effect of calls to different locations on the library list as compared to the effect of a qualified call.

Call performance can vary significantly, depending on where the program is found on the library list. Keeping the program in the current library closely approximates the use of a qualified call. As the program is found lower and lower in the library list, performance stretches out. If you want the best performance from repetitive calls, you have to sacrifice the flexibility of the *LIBL default or make sure the program you are calling is high on the library list. The system library list (system value, QSYSLIBL) also impacts library searches-try to keep that list short.

Conclusion: For the quickest call performance from CL, a qualified call should be used.

Calling from RPG to RPG

When you call from RPG to RPG, a pointer is saved which indicates where the program was found. (COBOL uses the same technique.) I tried an RPG-to-RPG call (4) using a subprogram which ended with LR off. I used a variety of techniques for specifying the program name, including a literal, a field name, an unqualified named constant and a qualified name constant. The results were essentially the same.

When you call from RPG to RPG, a pointer is saved which indicates where the program was found. (COBOL uses the same technique.) I tried an RPG-to-RPG call (Figure 4) using a subprogram which ended with LR off. I used a variety of techniques for specifying the program name, including a literal, a field name, an unqualified named constant and a qualified name constant. The results were essentially the same.

In fact, the results were so fast that I had to increase the number of iterations from 1,000 to 10,000 to show something meaningful. Even then, I was not sure I believed the results, so I inserted some special code in one of my tests to ensure it was being called 10,000 times.

The test results in 4 average to .0015 CPU seconds per call (versus the .015 per call average of CL to RPG). Obviously, the RPG-to-RPG call executes very quickly. Because the pointer to the program is saved and used for subsequent calls, the ability to have a subprogram which returns with LR off costs far less than you would imagine. It's approximately 10 times faster than calling from CL to RPG.

The test results in Figure 4 average to .0015 CPU seconds per call (versus the .015 per call average of CL to RPG). Obviously, the RPG-to-RPG call executes very quickly. Because the pointer to the program is saved and used for subsequent calls, the ability to have a subprogram which returns with LR off costs far less than you would imagine. It's approximately 10 times faster than calling from CL to RPG.

Conclusion: The ability to retain the pointer for subsequent calls makes the RPG-to-RPG call an excellent performer.

Calling from RPG to CL

Calling from RPG to CL also affords you the benefit of saving the pointer. Consequently, there is no real performance impact in using a qualified name. The CL program I used contained only a single CHGVAR command. Once again, I had to use 10,000 iterations to provide a meaningful answer. I ran two variations of this test, as shown in 5.

Calling from RPG to CL also affords you the benefit of saving the pointer. Consequently, there is no real performance impact in using a qualified name. The CL program I used contained only a single CHGVAR command. Once again, I had to use 10,000 iterations to provide a meaningful answer. I ran two variations of this test, as shown in Figure 5.

The fact that RPG saves the pointer of the program produces about a 4 to 1 improvement over the CL-to-RPG call. Returning to an already active RPG program is nowhere near as effective. The second case actually reproduces the results we saw when a CL program called an RPG program 1,000 times with LR off-we are just looking at 10 times the number of calls.

The saving of the pointer by RPG (or COBOL) has a significant payoff. Calling from RPG to CL is much more cost-effective than either CL-to-CL or CL-to-RPG calls.

Conclusion: When you need a CL function from your HLL program, the best thing you can do is call a CL program.

Command vs. Call

When you execute a command, the system must access the command definition object. Then the system checks each parameter passed on the command against the corresponding value in the command definition object to determine if it is valid, if a default needs to be inserted, if a conversion must be made, and so on. Then the CPP is called to do the work. Some command definition functions may call other programs such as a validity checking program (VCP) or a prompt override program (POP) before invoking the CPP.

What impact does this process have and how does it compare to calling the CPP directly? Although you can't invoke the CPP directly for system commands (at least, you shouldn't and I certainly don't recommend it), you can bypass a command interface for user-written commands.

Some good examples of this are the Send to Data Queue (SNDDTAQ) and Scan Variable (SCNVAR) tools in QUSRTOOL. The SNDDTAQ tool supports two commands- SNDDTAQ and RCVDTAQ. Both serve as front-ends for system programs-QSNDDTAQ and QRCVDTAQ. SCNVAR front-ends the system program QCLSCAN.

The purpose of the QUSRTOOL commands is to simplify the user interface, but what performance impact should you expect when you use them on a repetitive basis? In this test, I did 50 'sends' to a data queue and then 50 'receives,' first by executing the appropriate command and then by calling the system program (QSNDDTAQ or QRCVDTAQ) directly. I also executed 50 scans of an 80-byte variable (again using the command or calling QCLSCAN directly), looking for an asterisk (*) which was found in position 50. The QUSRTOOL commands are simple (e.g., no VCP or POP) and the CPP executes a single CL program. 6 contains the test results.

The purpose of the QUSRTOOL commands is to simplify the user interface, but what performance impact should you expect when you use them on a repetitive basis? In this test, I did 50 'sends' to a data queue and then 50 'receives,' first by executing the appropriate command and then by calling the system program (QSNDDTAQ or QRCVDTAQ) directly. I also executed 50 scans of an 80-byte variable (again using the command or calling QCLSCAN directly), looking for an asterisk (*) which was found in position 50. The QUSRTOOL commands are simple (e.g., no VCP or POP) and the CPP executes a single CL program. Figure 6 contains the test results.

The overhead cost for executing a typical kind of command appears to be about 6 seconds for 50 iterations or .120 seconds per execution. This is definitely not a cheap function.

I still like the programmer productivity of being able to specify a user command and the way commands document the parameters as opposed to a call with a long parameter list. So if you are only going to do a few of these commands, it's probably worth it.

Conclusion: User-written commands can be very helpful to users and very handy from a programmer productivity and documentation point of view; however, they do cost you in terms of performance. If you plan to execute the functions repeatedly, you should use a direct call to the CPP.

Review of Repetitive Accessing of Code

Let's review the numbers we just looked at and also the numbers I covered in "The Truth About RPG Performance Coding Techniques" (MC, September 1993), which discussed the cost of using an RPG subroutine.

Assuming you need to periodically execute a series of instructions, the list I've developed below shows the performance order (fastest to slowest) of techniques you can use in various situations. Before you modify all your code to make it adhere to these performance recommendations, make sure you take all things into consideration. In other words, balance the performance adjustments you make with a concern for programmer productivity and the clarity and maintainability of code.

Now, if you are in a CL program and want to execute CL commands repetitively:

1. Use in-line code.

2. Include instructions to simulate a subroutine in a CL program. You can do this with the GOTO command. For an example, see the CLPSUBR member in QATTCL in QUSRTOOL. It contains sample code you can copy to make a subroutine.

3. Call a CL program. A qualified call is the fastest performer. Sometimes this is the only way to get around the CL compiler restriction that only a single file can be read and that the file cannot be reopened once it reaches end-of- file.

4. Execute the commands using QCMDEXC.

If you are in an RPG program and need to execute CL commands repeatedly, the following recommendations apply under V2R2M0 which, of course, served as my test environment. With the advent of V2R3 and ILE, I expect some of these performance techniques to change (as I discuss in the next section).

1. Call a CL program. A qualified call is the fastest performer. There is an exception to this with the use of certain commands that are scoped to the program stack level. The Override commands (OVRxxx) are the classic example of this. If you call a CL program to do an override, the system throws away the override when you return.

2. Execute using QCMDEXC. This is the way you can get an override command in your RPG program before opening the file. An astute question is, "Why does this work when calling a CL program does not?" Obviously, both methods cause another program to come into the program stack. In the case of QCMDEXC, however, the override does not go away on the return due to some special code which QCMDEXC contains.

If you are in an RPG program and want to repetitively execute RPG instructions:

1. Use in-line code.

2. Use an RPG subroutine. In the September installment, I showed that this was not free. It cost .7 seconds for 50,000 iterations (an average of .000014 seconds each).

3. Use an RPG-to-RPG call in which the subprogram returns with LR off. This is nearly 100 times slower than using a subroutine.

4. Use an RPG-to-RPG call in which the subprogram returns with LR on. This has more than twice as much overhead as keeping LR off on the return.

If you are in a CL program and need to repetitively execute RPG instructions:

1. Call an RPG program which returns with LR off.

2. Call an RPG program which returns with LR on.

Any call from CL doesn't save the pointer to the program. Saving the pointer is the major performance advantage that both RPG and COBOL have.

What About ILE?

The ILE approach available in V2R3 will eventually be supported by RPG (a statement of direction exists). Even though the current RPG-to-RPG call is 100 times slower than a subroutine, it makes you wonder how much better a call can get with ILE. If your RPG program already calls several subprograms that return with LR off, you won't find much performance blood left to squeeze out with ILE.

For instance, the Model D02 which hosted the tests I've conducted is about 50 times slower than a Model F95. If ILE cuts the CALL overhead in half, you would need to execute nearly 300,000 RPG-to-RPG calls on an F95 to save one second of CPU time.

ILE is changing some of the rules that have existed for 15 years-ever since the introduction of the S/38. For example, your override commands were always lost when the program that executed the override exited the stack. It now appears that ILE will support new options on the override command to let you control how the override is scoped.

Currently, RPG does not have a method of specifying a member name and thus requires an override. Supposedly, ILE RPG will allow a member name (either as a literal or as a field name) so that the user can avoid all the override discussions.

At this point, it is not clear that there will be a significant performance advantage with ILE versus what you can do today for an RPG-to-RPG call. When ILE RPG appears, be sure you see some apples-to-apples performance comparisons before you make a big investment in converting your existing applications for performance reasons.

In any event, I'm looking forward to ILE RPG because of the language enhancements that will be available (e.g., 10-character field names, better file description specifications). Supposedly, several new functions will also provide better control over tasks that occur within a job. New ILE will allow some different application approaches and should be very attractive in some complex situations.

Jim Sloan is president of Jim Sloan, Inc., a consulting company. Now a retired IBMer, Sloan was a software planner on the S/38 when it began as a piece of paper. He also worked on the planning and early releases of AS/400. In addition, Jim wrote the TAA tools that exist in QUSRTOOL. He has been a speaker at COMMON and the AS/400 Technical Conferences for many years.


Improving the Performance of Program Calls

Figure 1 Calling from CL to CL

 CPU Job Test Seconds Seconds 1. A CL program executes one CALL to a .6 1 second CL program. The called program executes a simple Change Variable (CHGVAR) command in a loop which is performed 1000 times. This is a base case. 2. A CL program executes 1000 CALLs to 15.7 17 a second CL program. The called program executes a simple CHGVAR and then returns. 3. A CL program executes one CALL to a 50.0 67 second CL program. The called program executes a Send Message (SNDMSG) command in a loop which is performed 1000 times. This is a base case. 4. A CL program executes 1000 CALLs to a 64.3 81 second CL program. The called program does one SNDMSG command and then returns. 5. A CL program executes 1000 CALLs to a 79.2 95 second CL program. The called program executes a SNDMSG command using QCMDEXC. 6. Same as #3 but with LOGCLPGM(*YES) specified. 63.2 79 
Improving the Performance of Program Calls

Figure 2 Calling from CL to RPG

 CPU Job Test Seconds Seconds 1. A CL program executes one CALL to an RPG .5 2 program. The RPG program does a trivial function in a loop 1000 times. This is a base case. 2. A CL program does 1000 CALLs to the same 35.7 37 RPG program. The RPG program does a trivial function each time it is called and then returns with LR on. 3. A CL program does 1000 CALLs to the same 14.8 16 RPG program. The RPG program does a trivial function each time it is called and then returns with LR off. 
Improving the Performance of Program Calls

Figure 3 Unqualified vs. Qualified Program Calls

 CPU Job Test Seconds Seconds 1. A CL program does 1000 CALLs to an RPG 14.8 16 program. The RPG program does a trivial function each time it is called and then returns with LR off The RPG program exists in the current library which places it higher in the library search order than any library on the user portion of the library list. 2. Same as previous with the program found in 24.0 26 the 10th library on the user portion of the library list. 3. Same as previous with the program found in 32.4 34 the 20th library on the user portion of the library list. 4. Same as previous, but a qualified call is used 12.8 13 -e.g. CALL PGM(LIBA/PGMX). 
Improving the Performance of Program Calls

Figure 4 Calling from RPG to RPG

 CPU Job Test Seconds Seconds An RPG program does 10,000 CALLs to an RPG pro- 15.0 17 gram and the second RPG program returns with LR Off. 
Improving the Performance of Program Calls

Figure 5 Calling from RPG to CL

 CPU Job Test Seconds Seconds 1. An RPG program executes 10,000 CALLs to a 35.7 37 CL program. 2. A CL program calls an RPG program. When the 146.1 147 RPG program needs a CL function, it returns with LR off. The CL performs the function and then calls the active RPG program again. This was repeated 10,000 times. 
Improving the Performance of Program Calls

Figure 6 Command vs. Call

 CPU Job Test Seconds Seconds 1. SNDDTAQ command 7.6 10 2. QSNDDTAQ program 1.3 3 3. RCVDTAQ command 7.4 9 4. QRCVDTAQ program 1.2 3 5. SCNVAR command 6.8 8 6. QCLSCAN program .9 2 
BLOG COMMENTS POWERED BY DISQUS