Basic concepts to improve system performance
You're not going to believe it. Last weekend, I was at a shoestore that I think must be IBM's blueprint for the AS/400 operating system's work management. Let me explain this further. You see, this store, All Shoes 4 Dollars (or AS/4), is so efficient that it can sell brand name shoes for only $4. AS/4 has only one salesperson -- Charles Perry Urich (aka C.P.U.). The store also employs one stockperson -- Ingred Ottoman (aka I.O.). If you are getting a little nauseous by now, just bear with me. I will be explaining basic AS/400 work management concepts. Later, I will give some suggestions for improving the performance of your system.
The All Shoes 4 Dollars store layout consists of a showroom floor, an area for wholesale buyers, a huge warehouse, and office containing a cash register and typewriter. C.P.U. is a supersalesman who must juggle sales to several customers at the same time that he handles the wholesale buyers. He also types up sales and inventory reports on request for the store's owner, and handles phone calls. How does he handle all this demand for his time? Well, C.P.U. has devised a brilliant method for managing his work.
The typical customer wants to browse, and maybe try on a couple pairs of shoes. He usually needs some time for deciding whether or not to purchase a pair of shoes. C.P.U. therefore allocates 20 seconds of assistance time per retail customer. If a customer asks to try on a pair of shoes, C.P.U. requests Ingred (I.O.) to pull them from the stockroom. Meanwhile, C.P.U. will move on to the next order of business. If the 20-second period expires before the customer finishes talking with C.P.U., Charles will politely inform the customer that he will be right back with him.
Wholesale buyers and deliverymen are more willing to wait for C.P.U., but normally take more of his time to fill their orders. C.P.U. devotes 50 seconds of his time to the wholesale buyers, but puts greater emphasis on retail customers. As a result, wholesale buyers sometimes have to wait for longer periods than retail customers; wholesale customers and deliverymen must take a number and wait. C.P.U. only works with these people individually, whereas he handles as many of the retail customers concurrently as possible. If wholesale buyers want better service, they can come in after 8 pm, or Saturdays when retail sales are closed.
Not forgetting job number 1, collecting cash and the sales and inventory reports, C.P.U. puts higher priority on those functions than on the retail and wholesale customers. These tasks take a shorter period of time, however, and C.P.U. normally can still satisfy most of his customers.
Measure Your AS/400's Shoe Size
You may think that this store is in the state of Confusion. Normally, though, things work great at All Shoes 4 Dollars. But let's travel out of the city of Analogy and into explaining how the AS/400 manages its work.
Your AS/400 has a wide variety of tasks that it must perform. Interactive and batch user jobs are the most obvious, but OS/400 also manages printing functions, communications, database, and low-level system tasks. The AS/400 allows multiple jobs and their processes to exist in main memory at one time. But AS/400 Work Management must handle contention between these jobs for main memory and CPU time, as well as for disk, printer, and screen I/O.
To designate appropriate amounts of main memory to jobs, the AS/400 divides memory into segments called pools; the AS/400 allows up to 16 pools to be defined. This subdivision of main memory into pools is analogous to the shoestore's building being divided up into rooms. When your AS/400 is shipped it is initially setup with three pools. The first is the machine pool which handles low-level machine functions. The second pool is *BASE, pronounced "star-base." *BASE, at this point, handles all users jobs, both batch and interactive. The last pool is QSPL, which is used for printing functions.
To more efficiently handle memory and CPU contention on the AS/400, the system allows you to subdivide main memory further. Pools are delineated by subsystem descriptions. A subsystem is simply a predefined method of handling the processing and pool allocation of jobs. A subsystem can be defined to share main memory in the *BASE pool. This subsystem's jobs will share the main memory in *BASE with other subsystems that also use *BASE. Normally though, most subsystems are defined with their own separate pools to segregate jobs that have the same types of requirements. Refer to IBM's Work Management Guide for more explicit information on modifying subsystem descriptions.
IBM also has another subsystem configuration predefined. This configuration has subsystems that are designed for specific types of jobs. The system value QCTLSBSD first must be modified to have the value 'QCTL QSYS', instead of 'QBASE QSYS', to begin this method of work management. The next IPL will then bring the system up with no less than six subsystems.
QCTL - the controlling subsystem
QINTER - handles interactive jobs
QBATCH - handles batch jobs
QCMN - handles communications jobs
QSPL - handles printing functions
QSNADS - handles SNADS network and IBM-supplied transaction programs
I say "no less than six," because you can create your own subsystems or implement some of the other ones that IBM supplies but does not activate. These six subsystems are started automatically by the system during each IPL, with the program QSTRUP in QSYS. To have other subsystems started automatically on your next IPL, you have to modify this program or retrieve the CL source into a source file and modify it there. If you would like to use a startup program other than the one in QSYS/QSTRUP, you need to change the system value QSTRUPPGM. You can also easily start or end subsystems at any time with the commands STRSBS and ENDSBS. Make sure that all critical jobs that are running in the subsystem you are trying to end have completed before using the ENDSBS sommand.
The WRKSYSSTS screen can be toggled to display the subsystem name associated with each of the user pools 3 to 16 with command key 14. In figure 2, the WRKSYSSTS display shows the number 2 beside each of the subsystem names under the column for "Subsystem Pools" 1. The number 2 here refers to the system pool 2 or *BASE. The numbers 4, 3, and 5 beside QBATCH, QINTER, and QSPL refer to user pools defined to the system with their respective subsystem descriptions. 1 then, more succinctly displays the system pool definitions. Notice that the subsystems QCMN, QCTL, QPGMR, and QXFPCS all have *BASE defined as their shared pool.
The WRKSYSSTS screen can be toggled to display the subsystem name associated with each of the user pools 3 to 16 with command key 14. In figure 2, the WRKSYSSTS display shows the number 2 beside each of the subsystem names under the column for "Subsystem Pools" 1. The number 2 here refers to the system pool 2 or *BASE. The numbers 4, 3, and 5 beside QBATCH, QINTER, and QSPL refer to user pools defined to the system with their respective subsystem descriptions. Figure 1 then, more succinctly displays the system pool definitions. Notice that the subsystems QCMN, QCTL, QPGMR, and QXFPCS all have *BASE defined as their shared pool.
The QCTL subsystem basically uses the strategy that Charles P. Urich used in the shoestore to control jobs. It supervises system processes and the other active subsystems.
The QINTER subsystem controls interactive jobs. These jobs are similar to retail customers. Interactive users want to browse through the database and perhaps do some maintenance. Usually there is more "think time" than actual CPU time. Interactive users, though, expect good response times on their jobs, and obviously there are always numerous jobs running at any given time.
Batch jobs are handled with the QBATCH subsystem. Batch functions can be compared to the requests of wholesale customers and delivery personnel. These jobs do not demand immediate response, but they do require a high degree of CPU usage to handle extensive file manipulation and report generation.
The QSPL subsystem supervises writer and printer activity like Charles' typewriter work did, and QCMN could be considered similar to Charles' handling phone calls and inquiries.
The All Shoes 4 Dollars shoestore had certain amounts of C.P.U.'s time allotted for each function. The AS/400 also does just that. Subsystems have an attribute known as time slice. What is a time slice? A time slice is the amount of processor time a job is allowed before other waiting jobs are given the opportunity to run. The time slice for interactive jobs is initially set to 2 seconds, while batch jobs have a time slice of 5 seconds. This normally works fine since, as I said earlier, the CPU time required for batch jobs is significantly higher than interactive's.
Charles also put a higher emphasis on certain functions. In this way he knew what work to do next. The AS/400 handles this by having a priority definition in the subsystem. Priorities are numerical, the lower number having the higher the priority. The shipped system has the following priorities set:
System Jobs 0 Operator Jobs 10 Spool/Printer Jobs 15 Interactive Jobs 20 Batch Jobs 50
These priorities define which jobs get CPU time first. If a job's intermediate task is not completed in the allotted time slice, it loses its position and must wait for its next chance to run along with the other jobs in the system. The next job is then selected to run based on the highest-priority job waiting for time.
The AS/400 will try to keep as many jobs in main memory as possible at one time. Obviously, there is a ceiling to the number of jobs that can be physically in main memory. This is superficial to us since the AS/400 simply purges or "pages" interactive jobs to disk. (Charles tried using the parking lot for this function but the customers didn't go for it.) Each job has what is known as a PAG or process access group. It is the PAG that gets purged to disk. A job's PAG contains objects that it's using, but aren't being shared with other jobs. Program variables, file overrides, open data paths, and record buffers are all examples of PAG objects. Programs are not a part of the PAG, because AS/400 programs are reentrant. Reentrant programs allow one copy of a program, that is in main memory, to be shared among many jobs. This is one method that the AS/400 uses to improve performance.
At some point, the amount of work that Charles is expected to do for the shoe store becomes excessive. This can be handled easily by hiring more help and expanding the store. Perhaps, the overload is just an influx of customers. Maybe though, the store just can't afford another salesperson or stockperson.
The AS/400 can have the same overload problem. At that point the system goes into a state known as "thrashing." This occurs when the AS/400 spends more of its time paging jobs to disk then doing actual work. Certainly, IBM will be ready to sell more main memory or disk to solve this problem. But we really need to understand the demand imposed on the system and alternatives for controlling it before throwing money at the problem.
Activity levels for a subsystem are one method of controlling thrashing. An activity level is the number of jobs that can run simultaneously in a storage pool. Very often this number is either too low or too high. If the figure is too low, the memory allocated to a pool is underallocated. If it is too high, the memory is overallocated and thrashing occurs. Activity levels can be set with the POOLS parameter in the subsystem definition or on the fly through the WRKSYSSTS display. Typically, the interactive subsystem definition allows an unlimited number of users to sign on, but the actual number of jobs that are in main memory at one time must be evaluated periodically.
The amount of memory allocated to each pool is another area that needs to be analyzed. Very often memory sits idle in one pool while another pool is choked with too much work and not enough memory. Refer to the other article in this issue on system performance for more information on pool sizes and activity levels.
I have oversimplified, somewhat, the concepts of subsystem descriptions and their corresponding time slices, priorities, and activity levels. Subsystem descriptions are actually a little more complex. They point to other objects each of which has its own purpose in work management. 4 shows the Create commands for each of the objects pertaining to the QBATCH subsystem. Refer to that figure as well as the Work Management Guide to gain further understanding of job control. Appendix B of that manual lists all the shipped subsystems already in use or that may be implemented with little effort. QPGMR is a good example of a usable subsystem. By implementing this subsystem, you can separate programmers' compiles and batch testing from the QBATCH subsystem.
I have oversimplified, somewhat, the concepts of subsystem descriptions and their corresponding time slices, priorities, and activity levels. Subsystem descriptions are actually a little more complex. They point to other objects each of which has its own purpose in work management. Figure 4 shows the Create commands for each of the objects pertaining to the QBATCH subsystem. Refer to that figure as well as the Work Management Guide to gain further understanding of job control. Appendix B of that manual lists all the shipped subsystems already in use or that may be implemented with little effort. QPGMR is a good example of a usable subsystem. By implementing this subsystem, you can separate programmers' compiles and batch testing from the QBATCH subsystem.
How To Put Track Shoes on Your AS/400
Now that you have a better understanding of the basics of work management on the AS/400, let's get down to the specifics of accelerating the operation of your machine. You can improve the performance of your system in a number of areas. Some of these changes can be implemented immediately; some take a certain amount of time and effort; others are application design methods that you may have heard before, but are worth repeating.
When segmenting your main memory into pools to separate job functions there are a couple of basic rules you need to follow. First, try to use only one priority per pool. Second, in pools allocated for batch jobs, use single-threading. Single-threading can be achieved by setting the POOLS parameter of the subsystem definition to specify an activity level of 1. Having only one job queue for the batch subsystem with the MAXACT parameter set to 1 is another alternative. Thirdly, if a batch subsystem is in use, less than 50 percent of time, start and stop the subsystem as needed. Refer to article two in this issue for more specifics on pool sizing.
Rethink Batch Work
Review the daily runs of batch jobs, many of which should be scheduled to run at night. Ask users for suggestions on what batch functions could be added to your night schedule. Alternatively, you could put batch job queues on hold and release them at night or, better yet, use QUSRTOOLS' Submit Time Job command. These last two methods of rescheduling batch jobs will still allow user control and responsibility of batch job submission.
Be aware of the effect of ad hoc requests on normal processing. Move those ad hoc reports to night runs, if possible. Also, fully test ad hoc programs and queries with subsets of the data, and have the requesting user OK the resulting format of the report before submitting the run. You can waste hours of processing time only to have the user reject the report because of invalid selection criteria that he probably gave you, or improper calculations. Also watch out for poorly performing Querys that are run often -- rewrite them either with better query logic or a program. I might also suggest that everyone read the appendix of the AS/400 query manual. This brief but well-written section gives excellent suggestions for designing efficient selection logic and file linking. These tips can be used for SQL and OPNQRYF as well as for AS/400 Query.
Look for interactive jobs that should be run in batch. Every system seems to have at least one program that is in this category, especially applications converted from the S/36. AS/400's work management system simply wasn't designed for I/O-intensive applications to be run out of the QINTER subsystem. Even some online transaction and file maintenance programs could be converted to run in a semibatch mode. Heavy updates and complex maintenance work could be more efficiently handled by shorter interactive programs that send information to data queues. Asynchronous jobs can then do the heavy processing.
Logical Use of Files
OPNQRYF is more efficient than logical files when using its select/omit logic. But if the selection is static, a logical file could be a better choice. Perhaps you could try using the select field as a key and use SETLL to process. Use existing key structures in your OPNQRYF key selection, since the database manager will use any access paths found that are like the OPNQRYF key fields.
Don't go to extremes creating logicals over your database. Every access path added to a file adds in overhead when that file is updated. Even if the added logicals are not open during the update of the physical file, the system defaults to maintaining the "sort" of all the logicals. The AS/400 allows you to specify exactly when access paths should be updated. The default access path maintenance option is IMMED, but there are also two more options. Setting the maintenance to DELAY causes the system to update access paths "when it gets the time." The third option, REBLD, is used basically for batch jobs when the path gets completely rebuilt on open. Use the IMMED option for the primary access path over the unique key, DELAY for secondary access paths and REBLD for batches, or use OPNQRYF for those batch jobs.
When using select/omit in logicals, set the DYNSLT parameter to static only for those logicals that are used heavily mostly for interactive use. Set the DYNSLT to dynamic for logicals used in batch runs of less- used interactive requests. The DYNSLT parameter specifies whether the select/omit access path should be maintained immediately on any update of that file, or if it should be delayed until a read request. DYNSLT of dynamic cuts down on access path maintenance overhead, but it does slow down retrieval time for those select/omit logicals.
Fine-tuning programs could significantly improve response times. This may appear to be an insurmountable task, but just use the old 80/20 rule. 20 percent of the programs are used 80 percent of the time. Some of those heavily used programs may have been written poorly or perhaps were converted from systems that didn't have the advantages that the AS/400 has.
Write it Right
Normalize that database. If you don't have the expertise in your shop, or even if you do, have someone else who does review your design. Normalized databases improve the efficiency of the system. There is some argument about what normal form should be used for files. Some feel that second normal form should be used for transaction files, and that varying levels of normal forms from second to fourth be used for master files. Nevertheless, this is a highly technical issue, and everyone should at least be cognizant of the need for this expertise.
Structured applications can run significantly better than unstructured ones. Your code should flow logically from top to bottom, no goto's to far flung sections of code. Commonly used code should be grouped together, for example input-output processing. Error handling should be put in subroutines, or better yet, in separate programs. Writing small modular programs can improve performance. The number of times a program is to be called and the type of processing it will be doing can affect the amount of improvement, however . With the AS/400's reentrant programs, having numerous small programs allows a greater chance for the sharing of program code in main memory. But when returning from called programs that may be used again in that job, don't set on LR, instead use the RETRN opcode. When subprograms are no longer needed for that job, use the CL command RCLRSC to remove the allocated objects for that program.
Review disk arm activity and capacities periodically. Utilization should be 40 percent or less of average disk arm utilization or less than 10 disk I/Os per second per arm. If disk utilization gets too high, first review your database. Often, shops keep more information than is required on their systems. Purge those database files regularly. Disk capacity should not be much greater than 70 percent for a system's disk to run efficiently. Your IBM representative will gladly sell you more DASD once disk performance diminishes. Happily, new low- cost disks are now becoming available with a greater disk arm-to- megabyte ratio.
Several system values should also be checked. The QACTJOB system value should be approximately the figure that is displayed in the "active jobs" section of the WRKACTJOB screen during heavy use. If this value is too low, the system arbitrator uses some 35 percent of the system's resources to adjust workload definitions when the total number of active jobs exceeds this value. The QTOTJOB system value should also be set at what is in the WRKSYSSTS screen's "Jobs in System" value during heavy- use periods.
Back to Selling Shoes
If you are still confused about how the system manages work on your system and don't have any new ideas for use of subsystems; if you tried at least some of the performance guidelines above without improving your system, let me know. I'll go back to selling shoes for a living.
Work Management and Performance
Figure 1 WRKSYSSTS system status screen
Figure 1: WRKSYSSTS System Status Screen Work with System Status DONLEE 05/13/91 01:22:27 % CPU used . . . . . . . : 1.3 System aux stg . . . . . : 2202 M Elapsed time . . . . . . : 0:00:43 % aux stg used . . . . . : 80.7904 Jobs in system . . . . . : 60 Total aux stg . . . . . : 2202 M % perm addresses . . . . : 2.604 Current unprotect used . : 238 M % temp addresses . . . . : 1.012 Maximum unprotect . . . : 267 M Sys Pool Rsrv Max -----DB----- ---Non-DB--- Act- Wait- Act- Pool Size K Size K Act Fault Pages Fault Pages Wait Inel Inel 1 3000 2358 +++ .0 .0 .0 .3 .0 .0 .0 2 6352 0 4 .0 .0 .0 .0 .0 .0 .0 3 2000 0 10 .0 .0 .0 .2 1.3 .0 .0 4 5000 0 1 .0 .0 .0 .0 .0 .0 .0 5 32 0 4 .0 .0 .0 .0 .0 .0 .0 Bottom ===> F11=Display pool data F21=Expand views
Work Management and Performance
Figure 2 WRKSYSSTS subsystems screenFigure 2: WRKSYSSTS Subsystems Screen Work with Subsystems System: DONLEE Type options, press Enter. 4=End subsystem 5=Display subsystem description 8=Work with subsystem jobs Total -----------Subsystem Pools------------ Opt Subsystem Storage (K) 1 2 3 4 5 6 7 8 9 10 QBATCH 5000 2 4 QCMN 0 2 QCTL 0 2 QINTER 2000 2 3 QPGMR 0 2 QSPL 32 2 5 QXFPCS 0 2 Bottom Parameters or command ===> F3=Exit F5=Refresh F11=System data F12=Cancel F14=System status
Work Management and Performance
Figure 3 Performance guidelinesFigure 3: Performance Guidelines Pool Rules one priority per pool single thread batch pools batch subsystem use < 50% of time - start and stop as needed. Rethink Batch Work Move daily batch work to night Control Ad Hoc requests Convert interactive jobs that should be run in batch Logical use of files OPNQRYF if select/omit not static Use logic file if Select/omit is static Make select field a key and use SETLL Use existing key structures on QPNQRYF MAINT(IMMED) for primary access path MAINT(DELAY) for secondary access paths MAINT(REBLD) for batch work or use OPNQRYF DYNSLT as static for heavy interactive use DYNSLT as dynamic for less used or batch work Fine Tuning Programs 80/20 rule Write it Right Normalize that data-base Structure applications Small modular programs Hardware disk arm utilization should be <= 40% disk capacity should be <= 70% purge data-bases buy more DASD System Values QACTJOB = WRKACTJOB "active jobs" QTOTJOB = WRKSYSSTS "jobs in system"
Work Management and Performance
Figure 4 Create commands for QBATCHFigure 4: Create Commands for QBATCH CRTSBSD SBSD(QGPL/QBATCH) POOLS((1 *BASE) (2 500 1)) + MAXJOBS(*NOMAX) TEXT(' Batch Subsystem') CRTCLS CLS(QGPL/QBATCH) TIMESLICE(5000) PURGE(*NO) + DFTWAIT(120) TEXT('Batch Class') CRTJOBD JOBD(QGPL/QBATCH) USER(QPGMR) LOG(4 0 *NOLIST) + RTGDTA(QCMDB) TEXT(' Batch Job Description') CRTJOBQ JOBQ(QGPL/QBATCH) TEXT('Batch Job Queue') ADDJOBQE SBSD(QGPL/QBATCH) JOBQ(QGPL/QBATCH) MAXACT(1) ADDRTGE SBSD(QGPL/QBATCH) SEQNBR(9999) CMPVAL(*ANY) + PGM(QSYS/QCMD) CLS(QGPL/QBATCH)