To complement the reports and inquiries provided by their standard ERP system, most iSeries shops use some kind of reporting tool to provide extra information to their users.
Whether we use Query to drive an iSeries report, SQL or OPNQRYF to provide green-screen or Web inquiries, or a PC data analysis tool via ODBC, the system always uses the same API set to retrieve the data and return it to the calling application. Probably the most important component—and frequently the least understood element—of this API set is the Query Optimizer. Although this routine has only one purpose (namely, to decide the quickest method of accessing the data requested), it is one of the most sophisticated and intelligent routines that runs on the iSeries.
The likelihood is that the Query Optimizer is sitting in the background all the time on your system, being called many times every hour and making thousands of decisions every day about how to access your data in the most efficient method.
By default, this logic is hidden from view. This article will show you how to easily see the decisions the optimizer makes and how to improve its performance by judiciously creating logical files or indexes. At the end, you will see a real example of how this works. First, let's look at how to access the decisions the Query Optimizer has made.
(Note: Please bear in mind that whenever I refer to Query, this is a generic term meaning Query400, SQL, or OPNQRYF.)
How to Gather Information About Query Optimizer Decisions
As is always the case with the iSeries, there is more than one method of achieving this. If you turn on debug for a job by using STRDBG UPDPROD(*YES) and then run a query, the decisions the optimizer makes will be placed into the joblog. Alternatively, you can modify the query options file QAQQINI to log the information. You could use APIs in your code to do this, but by far the easiest method I have found is to use the STRDBMON command:
Running this command just once will cause OS/400 or i5/OS to log details of all query optimization decisions made across your whole system to a file called DBMON in library DBQRYLOG. The library has to already exist, but the file will be created if necessary. This file will give you all the information you could ever require about the use of queries on your system. And the command will stay in force until you stop it, even after a system IPL. When you wish to end the monitoring, just enter this command:
The file is structured so that there are different record formats, providing varied information about actual query usage. The first field on the file (QQRID) is the record identifier. An ID of 1000 is the "header" information about the query and can be used to identify the user and job details of the query, and the date and time the query started, and the source of the query (Query, SQL, or OPNQRYF). Similarly, if you look at records with an ID of 3007, field QQ1000 contains details of all the access paths that were considered by the optimizer. Details here are the library name and file name followed by a code. A code of zero means that this access path was used by the query. A non-zero value means that this access path was not selected—for example, a code of 5 means that the keys of the access path do not match, while a code of 19 means the access path cannot be used for a join as it contains select omits.
The records that we are going to concentrate on here are those in which Query Optimizer recommends that an access path be created. These are any that have a Y in field QQIDXA.
How to Interpret the Query Optimizer Data
By far the biggest impact on your data retrieval performance will be the creation of the indexes recommended by Query Optimizer. In order to get the most meaningful information, I recommend that you let the monitor process take place for at least two weeks. This is because you want to create the minimum number of indexes to achieve the most impact. For example, if on day one the Query Optimizer recommended you create an index over your outstanding orders file by sales area, you could go ahead and create an appropriate index and some queries would run faster immediately. It may well be, though, that overnight some jobs are run that recommend building an index on sales rep within sales area. Similarly, when you get to the end-of-week routines, the optimizer may recommend an index keyed on customer within sales rep within sales area. If you created an index at each stage, the system would be maintaining three indexes, rather than just one if you waited until the end of the week.
To see which indexes have been recommended, run the following SQL statement (or the equivalent query):
ORDER BY QQPTLN,QQPTFN,QQIDXD
This statement pulls out records from the monitor file where the optimizer has recommended creation of an index to improve performance (QQIDXA = 'Y'). It displays the library (QQPTLN) and physical file (QQPTFN) used and the recommended keys (QQIDXD). When you run this statement, you will most likely see a status message saying "Building access path" or "Building hash table." This is because the optimizer has decided to build an index to present the data in the sequence you requested.
If you still have the DBMON logging active and you run the SQL statement again, you should see entries for the first run of the SQL statement, recommending that you build an access path keyed on QQPTLN, QQPTFN, and QQIDXD. (On your system, the access path recommendation may be different; this will be entirely dependent upon the makeup of the data in the log file. If yours is different, please proceed but assume your recommendations are the same as my example.) As a working example of what this article is all about, let's follow its recommendations and create an index to do this using SQL:
This creates an index called DBMONL1 in library DBQRYLOG. It's based on file DBMON in library DBQRYLOG, and its key fields are QQPTLN, QQPTFN, and QQIDXD. You could achieve a similar result by keying in DDS and creating a logical file.
When you run the SQL SELECT statement again, you should notice that the query is satisfied much more quickly, although there may still be a slight delay because you're doing a record selection on the file. To go that extra mile, delete the index:
DROP INDEX DBQRYLOG/DBMONL1
Then, re-create it with the record selection field specified as an extra key.
CREATE INDEX DBQRYLOG/DBMONL1 ON DBQRYLOG/DBMON (QQPTLN, QQPTFN, QQIDXD, QQIDXA)
The statement will now run just a little bit faster. This is because the database engine can now read just the records required rather than reading them all and performing dynamic record selection.
How to Decide Which Logical Files/Indexes to Create
It's impractical to create an index for every recommendation issued by the optimizer. If you did, you would end up with thousands of extra logicals on your system. What you need to decide is what will give you the most benefit for the least overhead. My favorite way of doing this is to summarize the log file, grouping by physical file library and name and also by access path recommendation. If you also total the number of rows, you can see which access paths the system is spending the most time building. Here's the SQL statement to achieve this:
SELECT QQPTLN,QQPTFN,SUM(QQTOTR),QQIDXD FROM DBQRYLOG/DBMON
WHERE QQIDXA='Y' GROUP BY QQPTLN,QQPTFN,QQIDXD
ORDER BY QQPTLN,QQPTFN,QQIDXD.
Look for the entries with the highest number of rows and check to see if there is another physical file that may have the same recommended keys but with some extra trailing keys. Then, create the index using either SQL or DDS.
Worried About Logical File Proliferation?
In all my time working on the iSeries and its predecessors, I have heard many conflicting opinions about how many logical files, or indexes, it is wise to have over any particular physical. I remember one respected figure saying that under no circumstances should you ever have more than 20. When considering this issue, though, the main criterion is undoubtedly how volatile the physical file is. Once the logical has been created, the only overhead associated with it is when a record is inserted or deleted or whenever a key value in the logical changes.
So, for example, if you had a very large table containing all the previous years' sales history, which you added to only once a year, you could create as many logicals as you wished over this data because the only overhead would be when the records were being added. At the opposite end of the spectrum would be something like an inventory movements file, in which every transaction is recorded and you may be adding thousands of records every hour. Having many hundreds of logicals over this file would probably result in some kind of performance degradation.
To view the IBM documentation for STRDBMON and the other methods, click here.
MC Press Online