I recently tested reading a table using a variety of languages and access methods (Database Performance by the Numbers, MC, January 2001) and generated a firestorm of email about the results. (See Database Performance by the Numbers Feedback, MC, March 2001 for my reply to some reader emails.) Well, I couldnt leave well enough alone and decided to perform more tests! This month, Ill show the results of several tests based on random positioning within a physical file. Ill first test positioning; then the positioning and printing of the retrieved record; and finally the positioning, reading, and writing of the retrieved record to another table.
This time I had some help designing, writing, and running the test programs from my new partners in crime, Dan Nelson and Eric Merritt, who are consultants at McCormack & Associates, Inc., in Rock Hill, South Carolina (www.mcc-associates.com). Dan and Eric read my January article, ran all of the code on their AS/400 systems, and sent me the numbers. In addition, they offered to help me devise additional tests and to assist in the programming and evaluation of the results of those tests. Who was I to thwart the good intentions of such robust and erudite programmers? So here are the fruits of our efforts, along with a few rather surprising, and not so surprising, results.
A Few Words of Caution
Before I get too deeply involved in interpreting the results, I would like to talk about the intention of these tests. The main objective was to test programming language overhead, not how fast the AS/400 could read data from a disk by positioning its disk arms. Although most of these tests are not typical of real-world situations, they do, however, offer insight into language overhead on the AS/400 by looking at particular aspects of record manipulation and throughput. Download the code yourself at www.midrangecomputing.com/mc, test it, and take the results with a grain of salt, several slices of juicy red tomato, bacon, and mayonnaise richly slathered on two slices of tasty white bread. With that said, Ill get on with our numbers and scrumptious results.
Random Lookup Test
Dan and Eric first contacted me about the record level access (RLA) and SQL numbers from the Java tests reported in my January article. They had tested RLA versus SQL from
Java about a year and a half ago and found that JDBC was much slower than RLA. The numbers in my article led them to conduct further tests, which yielded data that supported my findings, showing that IBM is indeed making progress in the performance of both SQL and RLA in Java.
They felt that further analysis should be done, and I heartily agreed, since the January tests were just a raw movement of data. What was needed were tests more suited to real-world activity and to the strengths of RLA. I wanted to test random-positioning, where I felt that RLA would be superior to SQL. At first, I wanted to test using the random number generator, but Dan felt we should use a predetermined list of keys so that each program would be on equal footing, and, therefore, the results of tests would be more repeatable.
Dan had the better idea, so he got to devise a set of test data and a key file. The result was the addition of a packed-decimal sequential number to the WEBTEMP table, used in the January article, and a file to be used as a source for the lookup values. In addition, he collapsed the Timestamp field into a Char(26) so we would not be including the overhead of the timestamp machinations, as this really skews the RLA numbers (see my March article). In addition, he added two fields to hold a header and a line-item number so we could later simulate reading orders from the file. The final file structure is shown in Figure 1.
For the random-positioning test, we decided that the number of lookups against the file should be 5,000, as this value would give us reasonable execution time in which to evaluate the language overhead. The keys were loaded into the lookup file in the following manner: 5000, 1, 4999, 2, 4998, 3, and so on. In this way, we felt somewhat assured that we would be testing random movement against the file.
The logic of the test is such that each program reads the key table into an array and then records the value of the clock. Next, the program opens the file or prepares its SQL statement and then goes into a loop, where it retrieves a record for each key in the array of keys. When the array is exhausted, the program takes a second clock value and determines the time spent between tick 1 and tick 2. We then calculate the number of lookups per second.
We ran the tests numerous times (at least 10 times each), then averaged the results. Figure 2 shows the records per second, or RPS, for each language and access method on both the Midrange Computing machine and the McCormack machine. Both machines are IBM Model 170s. MCs machine has about 1 GB of RAM, and McCormacks machine has
.5 GB of RAM. The Midrange Computing machine is on V4R5, and the McCormack machine is running V4R4, and the Java version used in all of the tests on all platforms is
The first point of interest is that the C program with embedded SQL faired far worse than the RPG program with embedded SQL. C on the Midrange Computing machine allowed an average 2,000 RPS, whereas RPG allowed for 2,414 RPS. RPG using RLA performed as expected and handily beat embedded SQL by scoring 15,625 RPS on MCs machine and 8,264 RPS on McCormacks machine. This is quite good for RLA and random- positioning.
On MCs machine, Java/JDBC with SQL scored 294 RPS, whereas Java with RLA scored 333 RPS. The Java Native Interface driver scored 384 RPS on the McCormack box and 454 RPS on the MC box. Although the numbers for Java/JDBC with SQL on the McCormack machine were similar to mine, our RLA times wildly differed. Java RLA scored 1,050 RPS on their machine and only 333 RPS on MCs. Also, note the differences in the RPG RLA and RPG SQL results between the two machines. Those differences might have something to do with memory or pool configuration, but so far we are stumped as to the reason for the differences. As soon as we find the reason, we will publish it, as it
is obvious that it impacts Java RLA performance significantly. The only other obvious difference was the RAM size of the machines, but since mine is bigger (if that is impacting the test), it should favor me. Another difference is operating system version. My machine is on V4R5 and theirs is V4R4, so we will have to wait until they upgrade their machine and see if the performance of RLA goes away to test that theory. All I can say it that we are still looking for the answer and would appreciate any pointers or results from any of you that download and run the test code.
Could the Type of Key Be a Factor?
Since finding the timestamp and date performance problems in record level access, (see my March article), I have tried to be extra cautious when conducting performance tests, and I am always looking for extemporaneous effects. So to ensure that there was not a problem with packed-decimal keys, I changed the key to an integer and then recreated the logical files. I changed the programs, then recompiled and ran the tests many times.
Statistically, the numbers were identical, so packed-decimal keys and integers seem to yield an identical performance.
Another thing I wondered about was the effect of the logical file. Since this was a DDS logical file, you can use it in the FROM clause of the embedded SQL statement whereas if the file was an SQL index you could not do this. So I ran the program where the logical file was named as the target table and where the physical file was named as the target table. Again, there was no statistical difference in performance.
Finally, each of the values in the REQNBR field is unique, but the logical file and file definition do not reflect this. So I added a primary key constraint to the physical file on the REQNBR field. This shaved about 200 milliseconds off the processing time of the C SQL program but made almost no difference to the RPG SQL program.
Since Java is cross-platform, Eric ran the next tests from his Microsoft Windows NT 4.0 machine connecting to the AS/400 via JDBC. This machine was a 700 MHz Pentium III with 256 MB of RAM. The results were surprising: Java JDBC running on NT 4.0 clocked almost the same as it did running on the AS/400. Yep, when doing the test with SQL, it did not matter if the code ran on the AS/400 or on NT 4.0; you got the same records per second. In fact, NT 4.0 edged out the AS/400 in this test by a whopping 4.9 RPS.
Figure 3 (page 77) shows a set of 10 runs on both NT 4.0 and the AS/400 doing the random-position test with JDBC. The column labeled INI (initialization) refers to the time spent preparing the SQL statement and is expressed in seconds. The column labeled Read is a measure in fractional seconds of the time it took to randomly position and read 5,000 record using SQL. What I find fascinating is that the machine running NT 4.0 also consistently prepared the SQL statement faster than the AS/400. (This might be an effect of packaging, but the JDBC drivers on the client NT 4.0 machine and on the AS/400 were both set to use packages.) Remember that when the program is running on NT, it must send the statement to the AS/400 using TCP/IP and wait for the AS/400 to prepare the statement and send an acknowledgement. The network latency should have caused the Java program to take more time on the NT machine, because there should be no network latency inside the AS/400. However, what we may have here is an example of MHz matters. See, since Java is garbage collecting and moving objects all around in RAM, the faster your processor and RAM bus are, the faster the Java can process. Eric and I are currently designing more tests so we can properly measure the effect of MHz and RAM speed on Java programs. We will let you know what we find, but in the interim, please send any other suggestions that you may have.
As for the RLA code running on Erics NT machine, it consistently clocked at half the speed of the AS/400, which is what we were expecting the JDBC code to do.
Because the tests so far have read only randomly from a table and are not real world tests, I conducted further tests based on a small modification to the RPG program that lets it print a report about each record it reads. Figure 4 shows the results of the printing tests.
What I found most interesting about these tests is that once printing was added, the records per second dropped very little for the SQL programs, but they dropped by almost half for the RLA-based programs. This could indicate that some optimization was done in the RLA programs that was not done in the SQL programs. Note that in the first tests, the programs did not actually do anything with the data; they just found the records. But the SQL programs were pulling the data into a structure. I could argue that the SQL program is doing more work. It is also possible, however, that the data was available to the RLA program but that the RLA program just didnt use it. By adding printing, if the data was not getting moved, that optimization disappeared because the program was now having to access the data to send it to the printer, which cut performance. Again, this is only conjecture, and I would love to have one of you MI heads look at the compiler output and tell all of us what is really happening.
Readin and Writin
The next tests were based on another modification to the RPG random-positioning program (the results of which are shown in Figure 5). In these tests, we were loading keys from the RANKEY2 table and then using the RANDATA02 logical file to find records by the REQHDR field. Once a record is found, the program reads the rest of the records where the REQHDR field matches (it happens to always be 10 records in this test). As it reads each record, it writes the record to the file RANDOUT, which is a mirror image of RANDATA. The key seed file contains 500 records, and since there are 10 details for each key, the program is reading and writing 5,000 records in Toto (a small dog from Kansas). The program takes a timer tick after the array of keys has been loaded, before the preparation of the select and insert statements or before the opening of any files. The program takes a second tick when the array is exhausted, computes the difference to get a total time elapsed, and then calculates seconds elapsed to perform the test.
Note that RLA triumphed once again, clocking in at .575 seconds, whereas SQL took 1.524 seconds to accomplish the task. Again, there was a difference between the average time that the McCormack box performed using RLA and SQL in RPG versus those that the MC box performed. I still think this has to do with the amount of available memory but welcome your suggestions and tuning tips.
The really interesting result is when the Java JDBC programs are executed from an NT environment; Java JDBC from an NT 4.0 client is faster than Java JDBC and Java Native Interface running on the AS/400. Personally, I think this has more to do with the clock speed of the box than with the access method. Again, network bandwidth overhead should make this slower, but the added clock speed of the PC might make up for the latency of the network by processing the instructions and garbage collection faster. Also note that Java RLA on an NT client takes double the time of Java RLA running on the AS/400. Here, I think that the network latency is kicking in and giving results more like one would expect.
Finally, I cheated a little on the last number in Figure 5, RPG SQL Bulk. You see, there is a feature in SQL called blocked inserts. I set up another copy of the program to do a blocked fetch of the 10 records that match the key into an RPG occurs data structure. I then bound the structure to an insert statement with the following syntax:
INSERT INTO RANDOM/RANDOUT 10 ROWS
VALUES (:reco :indary)
The 10 ROWS indicates that I will be sending 10 records to the statement in my data structure. The :reco and :indary are occurs structures that represent my set of data and my indicator array. Blocked inserts and selects are cool ways to get extra performance out of SQL, so you should try to put them in your programs (if you are using embedded SQL, that is). I consider this cheating because there is no program logic to handle less or more than 10 rows that match a key. In the real world, I would never always have 10 details, so I would have to get a little fancier with my code to use this technique. However, the technique is valid, and I wanted to share an example of the blocked insert technique in the downloadable code.
Final Frontier Thoughts
As in the last article, these tests are just simple programs that really dont do a heck of a lot. Yes, they are not real-world programs; they are merely attempts to set up artificial, arbitrary circumstances so that the overhead of a programming language and access method can be evaluated. Yes, the data is mostly in cache when it is running, as I dont care about factoring disk arm movements; I only care about overhead. The idea behind this is to get some basic metrics on how languages and methods perform doing simple tasks so that you can pick the appropriate tool for the job at hand or know the overhead of the tool that you are using.
I again invite you to download the code and data used in these tests and run them on your own hardware, in your own environment. If you find a way to make the program faster, by all means send it to me for a future performance article. There is a paltry amount of information on performance in the IBM documentation, and a lot of it requires a Ph.D. to understand. If we, the AS/400 community, start to share and pool our knowledge, tips, and tricks, we can eek every last cycle from our machines.
And, finally, a special thanks to Dan and Eric for all of their assistance with this article and for their tireless running and rerunning of tests. Without them, there is no way I could have written this text.
A* WEBTEMP TEST
A R RANDATAR
A REQHDR 8S 0
A REQDTL 8S 0
A REQNBR 8S 0
A REQTYPE 10A
A REQFILE 80A
A BROWSER 10A
A REQTS 26A
A REQSIZE 11A
A REQUSER 11A
A GEOID 6A
A* K REQHDR
A* K REQDTL
Figure 1: This DDS spec of the RANDATA file was used in all of the tests.
MCs McCormacks Model 170 Model 170
RPG RLA 15,625 8,264 RPG SQL 2,415 1,461
C SQL 2,000
Java RLA 333 1,050 Java JDBC 294 294 Java Native 454 384 Java on NT JDBC 299 Java on NT RLA 500
Figure 2: These are the RPS measurements of the various languages and access methods used in the random-position test.
NT 4.0 AS/400 INI Read INI Read
.141 17.094 .633 17.490
.151 16.383 .633 17.490
.160 16.253 .419 17.883
.140 16.754 .515 17.596
.131 17.324 .587 17.471
.150 16.094 .715 17.520
.160 17.195 .389 17.617
.170 16.314 .666 17.537
.161 19.367 .506 17.604
.141 17.995 .309 17.577
Figure 3: This is a log of the statement preparation time versus the time to process 5,000 records on NT and the AS/400.
MCs McCormacks Model 170 Model 170
RPG SQL 2,083 1,276
RPG RLA 8,928 4,812
Figure 4: These results show the RPS for the random read/print test.
MCs McCormacks Model 170 Model 170
RPG SQL 1.524 3.253 RPG RLA .575 .967 Java JDBC 39.062 26.178 Java Native 25.002 24.038 Java RLA 18.050 6.666
NT 4.0 Client
Java JDBC 16.025
Java RLA 16.286 RPG SQL Bulk .543
Figure 5: These results show the RPS for the random read/write test.