| READE and READPE Revisited |
|
|
|
| RPG | |
| Written by Joe Pluta | |
| Wednesday, 18 June 2008 | |
|
Those who forget history are doomed to repeat it, and in software, doomed to reinvent it.
READE and READPE are powerful RPG opcodes whose use seems to be dwindling. Part of that may just be that people don't remember them, so I thought I'd revisit them in this article. Processing a LoopThe situation is pretty simple. You want to process a set of records based on a key. This could be listing all the open order requirements for a given line or all the shipments from a specific warehouse on a given date. At first glance, this is often the domain of SQL and its unmatched ability to do ad hoc queries. To do this with native I/O, you must have built a logical file in the correct sequence; otherwise, you have to read every record in the file. But really, the same problem exists with SQL. An ad hoc query, especially over a large database, can be very slow if there is no corresponding INDEX (the SQL equivalent of a logical file).
So while a truly ad hoc query (one without a pre-built access path) is very nice in one-off situations where flexibility matters a lot more than performance, in the case of a task that is performed regularly, you will need an access path whether you're writing SQL or native I/O.
So assuming an access path is in place, then what really is the benefit of using READE or READPE over an SQL statement...or over a simple READ for that matter?
READ(P)E vs. READ
The problem when using a read is that you have to do the key checking yourself. In the case of a simple key, that's not so bad. Let's take a file called ITEMS with a single key field named ICLASS. The key you want to process is in MYCLASS. Your code looks like this:
setll (ICLASS) ITEMS; read ITEMS; dow (not %eof(ITEMS) and ICLASS = MYCLASS); exsr process; read ITEMS; enddo;
First, you use the SETLL command to position your file to the first record that matches your condition. Then you read through the file, checking each record to be sure it still fulfills that condition. This is very easy with one key. But it starts to get ugly with more than one key, and it ends up looking like this:
setll (MYFROMWHS: MYTOWHS: MYITEM: MYDATE) SHIPMENTS; read SHIPMENT; dow (not %eof(SHIPMENTS) and SWHSFROM = MYFROMWHS and SWHSTO = MYTOWHS and SITEM = MYITEM and SSHIPDATE = MYDATE); exsr process; read SHIPMENT; enddo;
As you can see, the code can start to add up. However, with READE it's easier to specify the keys:
setll (MYFROMWHS: MYTOWHS: MYITEM: MYDATE) SHIPMENTS; reade (MYFROMWHS: MYTOWHS: MYITEM: MYDATE) SHIPMENTS; dow (not %eof(SHIPMENTS)); exsr process; reade (MYFROMWHS: MYTOWHS: MYITEM: MYDATE) SHIPMENTS; enddo;
Note that now you only have to check for end-of-file using the %eof BIF. That's because READE will return end-of-file when it hits a record whose keys don't match the key fields specified on the instruction.
And if you find that to be too much typing, you can hearken back to an even older technique, the key list. Although its use is frowned upon as being less "self-documenting" than the code above, it still has its place. A key list is a predefined set of fields used as a key:
C SHIPKEY KLIST C KFLD MYFROMWHS C KFLD MYTOWHS C KFLD MYITEM C KFLD MYDATE
Then you simply use the key list:
setll SHIPKEY SHIPMENTS; reade SHIPKEY SHIPMENTS; dow (not %eof(SHIPMENTS)); exsr process; reade SHIPKEY SHIPMENTS; enddo;
On the positive side, it's a lot less code to type, and that can be an argument itself when you have to use the same key fields in many places in a program. Also, by specifying the keys in one place, you don't have the potential issue of accidentally specifying different keys at two points in the program.
On the negative side, if you want to understand what the I/O opcodes are doing, you need to refer to the key list; hopping back and forth between the actual program logic and the key list definition can be counter-productive. It's this lack of localization that has led to the KLIST opcode being generally deprecated by the programming community.
All of these same issues apply to the READPE opcode; the only difference is that you use the SETGT opcode rather than SETLL to position the file and then execute READPE to read through the file in reverse sequence. (And remember, reverse sequence is not necessarily descending order; if the access path specifies descending order on a key field, then READPE will read that field in ascending sequence!)
READ(P)E vs. SQL
This is a tougher issue to present. In most cases where you would use a partial key, an SQL cursor can function just as well. For example, if you're simply using a set of fields to limit the records processed in a given loop, this works just as well:
exec sql declare cursor c1 as select * from SHIPMENTS where SWHSFROM = :MYFROMWHS and SWHSTO = :MYTOWHS and SITEM = :MYITEM and SSHIPDATE = :MYDATE order by SWHSFROM, SWHSTO, SITEM, SSHIPDATE; exec sql open c1; fetch next from c1 into :SHIPMENTDS; dow SQLCOD = 0; exsr process; fetch next from c1 into :SHIPMENTDS; enddo;
The proponents of self-documenting code point out that this code is pretty self-contained; all the information is there, including the order of the records and the selection criteria, which is in contrast to the native I/O environment, where the key information is specified in the logical view and the selection criteria is embedded in the program logic itself. You have to decide for yourself whether that's a better way to program. I'm an old RPG dinosaur, and I grew up with ISAM access, so the fact that the file defines my access is fine by me.
Also note that in both situations I've left out error-checking. I'm old-school, and I figure that if I'm getting database I/O errors, it's either because of a bad application design or an honest-to-goodness hardware problem, and in either case, a hard halt is fine with me. If you disagree with that, let me know, and I'll address it in another article. But for now, let's try to keep things simple.
Now, a savvy SQL proponent might inquire as to what the "process" routine is actually doing, and if the processing is simple enough, you might be able to circumvent a lot of this by using a set-based SQL statement rather than a cursor. For example, if all you were doing was totaling quantity, you'd do something like this:
exec sql select sum(SQUANTITY) into :MYTOTQTY from SHIPMENTS where SWHSFROM = :MYFROMWHS and SWHSTO = :MYTOWHS and SITEM = :MYITEM and SSHIPDATE = :MYDATE;
You'd be done. Not only that, but chances are this would execute much faster than doing it via native I/O. The performance often gets better as you add other files joined in by foreign keys--for example, to calculate unit-of-measure conversions or whatnot--because the I/O tends to stay in the SQL engine and is executed at a lower level than the compiled program.
So why wouldn't you use SQL all the time? Well, if the processing is too complex or requires business logic that isn't easily expressed in SQL, you may choose to not use SQL. Or, if the processing involves doing things other than computations, such as writing to other files or calling other programs, then an aggregate SQL statement is usually not the best solution.
But that still leaves the SQL cursor approach. Where does this fail? The most typical case where the SQL cursor fails is when you need to break out of the subset. By that, I mean situations where the user wants a set of records based on some criteria but then wants records just outside that boundary. A typical example is the above case, where the last key field is the ship date. The user may want to see a list of records on May 23, but after looking at those records, may want to "page back" and see the data just prior to May 23. With an SQL cursor, the selection criteria identify a hard selection; the cursor does not contain records outside that boundary. In this case, the cursor cannot be positioned to records prior to the selection date. The only recourse is to actually create another cursor, and that can be expensive.
How expensive? Expensive enough to make it worthwhile to use traditional indexed access rather than SQL? That depends on the cursor, and in many cases the two techniques may yield very similar results, at which point you have a decision to make. And when comparing two different programming techniques that yield the same results, the intangibles sometimes play a larger part than the actual mechanics. For example, when the two approaches are otherwise equal, I prefer the simplicity of indexed access over the wordiness of SQL. I also prefer RPG over COBOL; that's just the way I am.
But before you get to the intangibles, you need to understand the mechanics, and this article has shown you how the approaches compare. |
|
|
Last Updated ( Wednesday, 18 June 2008 ) |
|
| RingerSoftware |
Re:READE and READPE Revisited
Jul 20 2008 00:37:11 LuAnnBu wrote:
then look at where the file is positioned when the READE fails... At end EOF. That's the documented behavior. Yes, a failed Read is slower than a good Read. My opinion - comparing performance of READ vs READE is fun but is splitting hairs, comparing performance of READE vs. SQL is much more relevant.I gotta disagree with this. It's not splitting hairs. File I/O is the slowest part of any RPG business application (and opening/closing files in RPG!). If you can enable record blocking where appropriate with a READ instead of READxE, your app runs faster. It zooms. It might run in 15 minutes instead of 60 minutes. I've seen it many times. Chris |
#121971 |
| RingerSoftware |
Re:READE and READPE Revisited
Jul 20 2008 00:27:13 charleswilt wrote:
READE flushes the RPG file buffers every time, just like a CHAIN does, this is the logical I/O joelEiken was talking about. Thus, no blocking.CHAIN allows input blocking too. Read the IBM RPG Reference PDF. I Just add BLOCK(*YES) on the F-spec and never use a READE or READPE on that file. You will see that compile message Rxx7086 that says "RPG handles record blocking..." that everyone seems to ignore. To eek out extra speed in an RPG batch program, I enable record blocking with CHAIN/READ combo and manually check for key changes in the DOW read loop. If you do a DSPJOB, 14, it will be obvious that record blocking is enabled for that file. The input count will increment slowly while the RRN races, and your program total run time will be shorter. Chris |
#121970 |
| JohndeCoville |
READE & READPE Revisited: Errors when reading
Jul 17 2008 03:31:52 Joe!
I find file-read errors very useful. One of them is record-lock. I use it all the time because we have several mission-critical large systems that will never be on the browser or on windows here at County Government. Thanks |
#121968 |
| charleswilt |
Re:READE and READPE Revisited
Jul 03 2008 20:30:48 My understanding:
READE flushes the RPG file buffers every time, just like a CHAIN does, this is the logical I/O joelEiken was talking about. Thus, no blocking. READ on the other hand uses blocking, as long as you don't use SETLL or if you have a SETLL then you need BLOCK(*YES) on the f-spec. http://archive.midrange.com/rpg400-l/200503/msg00889.html So, READ _can_ be significantly better performing than READE, particularly if you tune the blocking factors. On the other hand, READE always compares keys correctly even when using special key features like ALTSEQ, ABSVAL, ALWNULL. (per Barbara Morris, IBM RPG compiler team http://archive.midrange.com/rpg400-l/200503/msg00899.html ) HTH, Charles Wilt |
#121952 |
| joelEiken |
Re:READE and READPE Revisited
Jun 20 2008 08:56:54 When I say random IO, I mean that a random logical IO is performed. The system QDBGETKY routine is called for every READE. If you specify the BLOCK(*YES) parameter on the file spec for a keyed file, RPG will do blocked IO for READ and READP when used with SETLL, CHAIN, etc. Maybe not a big deal for a couple of records per key, but when looking at all records for a zip code, or a date range it can have a big impact.
|
#121919 |
| LuAnnBu |
Re:READE and READPE Revisited
Jun 19 2008 03:26:51 A READE is not a random I/O any more than a READ on a keyed logical file is random I/O. The extra overhead comes from the comparison of the keys after the sequential access and prior to returning the record to the program. If you don't believe this try using shared opens and calling a program between the SETLL and READE that repositions your file pointer, then look at where the file is positioned when the READE fails. (It has been more years than I'll admit since I have tried this, so let me know if IBM has changed how this works!) It has been my experience that READE overhead is minimal and is less important now than it was when your whole machine had 70 CPW.
That said, it is a good idea when designing your code to consider that a failed I/O will always be more expensive than a successful I/O. Coding a CHAIN that will fail most of the time in the middle of a loop is a bad idea. In the context of this discussion, a READ that fails is more expensive than a READ that returns a record. READing at the bottom of a loop and checking the keys yourself usually results in a successful I/O when the DO loop finishes (unless you were truly processing the last key value in the file and hit EOF). A READE at the bottom of the loop results in a failed I/O to end the DO loop. This is probably more important to consider when designing a benchmark test on reads than designing a business application because your benchmark will hit the end of the loop many more times than your business app, in essence exaggerating the effect. Also, this is the reason I was taught to always check the file status after the SETLL or SETGT - no reason to execute a READE or READPE that you know ahead of time is going to fail. Do you really know where your file pointer is? My opinion - comparing performance of READ vs READE is fun but is splitting hairs, comparing performance of READE vs. SQL is much more relevant. Good article Joe. |
#121916 |
| J.Pluta |
Re:READE and READPE Revisited
Jun 18 2008 23:46:12 Interesting. It's been a while since I checked that particular concept, but doing a READ on a logical file doesn't seem to me to be naturally any faster than doing a READE. That is, unless you've sorted the physical file in key sequence. But I should go back and check; maybe there's some more blocking that happens on a straight read (but like I said, it doesn't make intuitive sense to me, since the exact same records are being read in either case).
|
#121915 |
| joel@eikenhorst.biz |
Re:READE and READPE Revisited
Jun 18 2008 22:21:26 Sorry, an increase in CPU compared to using READ and comparing keys yourself in the loop control.
|
#121913 |
| J.Pluta |
Re:READE and READPE Revisited
Jun 18 2008 21:00:08 Hi Joel! An increase in CPU compared to what? In my benchmarks, I've found that an aggregate SQL call is faster than a READ loop, but an SQL cursor is not. You can make the SQL perform better if you fetch multiple records at a time, however, such as into a MODS.
|
#121909 |
| joel@eikenhorst.biz |
READE and READPE Revisited
Jun 18 2008 20:48:21 This thread discusses the Content article: READE and READPE Revisited
While READE can be easier to understand and code, there is a cost in performance. A READE is always going to perform a random IO operation. Maybe not much of a problem if you are processing 1 or 2 records for each setll but for larger record sets it will cause a significant increase in CPU usage due to logical IO. |
#121907 |





