Practical Array Processing: Dynamic Arrays PDF Print E-mail
Programming - RPG
Written by Joe Pluta   
Monday, 30 March 2009 19:00

Another trick with arrays is sizing them. This article shows you how to size your arrays dynamically.

 

In an earlier article, I showed you how to initialize arrays and how to sort them based on a subfield. I did this with fairly small arrays, where you could easily define the values for the array in your D-specs. The next trick is trying load arrays from disk. You don't know how many you will load, and you really don't want to allocate the memory for the entire array.

Expanding on the Original Store-Sorting Program

The original program had a few stores hardcoded into the D-specs as initialized variables. In this version of the program, I'm going to create an array of 1000 stores and then load that array from disk.

 

A    H OPTION(*NODEBUGIO:*SRCSTMT)


These are just a couple of standard H-spec keywords that I use all the time. They both make it a little easier to debug programs.

 

B    FSTORES    IF   E             DISK    RENAME(STORES:RSTORES)


Here is the file specification for the STORES database file. I created it using DDL, so the record format name is the same as the file name; RPG requires us to rename the format.

 

C    d cStores         ds                  based(pStores)

C    d  aStores                            dim(1000)

C    d   aStoreID                     3    overlay(aStores)

C    d   aStoreCity                  20    overlay(aStores:*NEXT)

C    d   aStoreState                  2    overlay(aStores:*NEXT)

C    d   aStoreZip                    5    overlay(aStores:*NEXT)

C    d pStores         s               *

C    d nStores         s              5u 0

C    d x               s              5u 0

 

This is essentially the same code as from the previous program, except without the hard-coded values. I've defined a data structure and then an array of data structures within the data structures. I've done this (using the overlay *NEXT syntax) in order to be able to sort by subfields. The only problem here is the limitation of 65535 for the total size of the array.

 

One other thing you should note is that the data structure is defined as "based," using the keyword based(pStores). This means that the array has no memory assigned; instead, I need to create memory for it. At the end of the array definition, you see that I defined a pointer variable named pStores. Technically, I don't need to define the variable--the based keyword does that--but adding the additional definition line doesn't hurt. Either way, the pStores pointer is at an unknown state, probably null, when the program begins. Whatever is in it, it doesn't point to my memory, so the array is unusable. Before using the array, I need to allocate some memory.

 

      /free

D      setll *start STORES;

D      nStores = 0;

D      read STORES;

D      dow not %eof(STORES);

D        if HasDeli = 'Y';

D          nStores += 1;

D        endif;

D        read STORES;

D      enddo;

D      if nStores > 0;


Next, I check to see how many records I actually need. In this case, I simply loop through the file, looking for records that match. With SQL, it would have been much easier; I could have used a single exec sql statement to set the value of nStores to the count of records that matched my criteria. I hope to be able to show you that and a few other SQL techniques in the next installment of array processing.

 

E        pStores = %alloc( nStores * 30);

E        setll *start STORES;

E        x = 0;

E        read STORES;

E        dow not %eof(STORES);

E          if HasDeli = 'Y';

E            x += 1;

E            aStoreID(x) = StoreID;

E            aStoreCity(x) = StoreCity;

E            aStoreState(x) = StoreState;

E            aStoreZip(x) = StoreZip;

E          endif;

E          read STORES;

E        enddo;


Now for the meat of today's exercise. This section of code is quite simple: I read each record, and if it matches the selection criteria, I add the record to the array. In this case, I'm directly moving database fields from the external file to the subarrays one at a time.

 

It's a little more complex, though. As I observed earlier, the data structure actually doesn't have any memory yet, so neither do the arrays; if you were to use it, you would get some serious errors. They might not even show up immediately as errors; instead, you'd notice strange problems that you couldn't readily diagnose; this is one of the symptoms of memory corruption.

 

Note the first line of this section, which sets the value of pStores. The %alloc built-in function (BIF) will allocate the amount of memory you ask for and return a pointer. In this case, I'm asking for 30 characters per record found. The number of records found is in nStores, so the logical step is to allocate that number times 30--nothing fancy, and now pStores has a value.

 

Now I can read data into the array. I read a record and once again test for the selection criteria (that HasDeli is a 'Y'). If it passes, I increment the index and add the fields from the data record to the subarrays. There's a subtle opportunity for error here, though. If a new matching record was added between the time I computed the number of matching records and the time I began loading the array, I would go past the end of the array. I don't show the code for that here, because what you might end up doing is simply going back and recomputing the count and reattempting the load. Or you could simply ignore any additional records and have an incomplete list.

 

F        x = %lookup( '002': aStoreID: 1: nStores );

F        sorta %subarr( aStoreCity: 1: nStores);

F        sorta %subarr( aStoreState: 1: nStores);

F        sorta %subarr( aStoreZip: 1: nStores);

F        sorta %subarr( aStoreID: 1: nStores);


The primary difference now is how you process the arrays. When doing a lookup, it's relatively straightforward: you add two new parms that identify the starting element and the number of elements that you want to include in the lookup.

 

The sorta opcode is a little different. The sorta allows you to use a relatively new BIF called %subarr, which allows you to define a subsection of an array, or "subarray." The subarray can then be sorted just like any other array. The %subarr BIF isn't implemented everywhere; you can't, for example, use it in a %lookup BIF. I would have liked to see %subarr extended there to allow a consistent syntax for all subarray processing. No matter; use the appropriate syntax in the appropriate place and you're fine.

 

D      endif;

 

This is the endif to avoid trying to process an empty set.

 

G      *inlr = *on;

      /end-free

 

And this is how you get out. You may have noticed that I didn't execute a %dealloc BIF. That's because there is no %dealloc BIF. There is a dealloc opcode, but that's a little different. I don't like the fact that I use a BIF in one place and an opcode in the other. You'll have to make your own call. Any allocated memory is released when the activation group ends, so if you're using activation groups for proper housekeeping, you may choose to go that route as well.

 

That's it for dynamic arrays. The last part of this mini-series will be to implement dynamic arrays in conjunction with embedded SQL. Until then, keep coding!


Joe Pluta
About the Author:
Joe Pluta is the founder and chief architect of Pluta Brothers Design, Inc. and has been extending the IBM midrange since the days of the IBM System/3. Joe uses WebSphere extensively, especially as the base for PSC/400, the only product that can move your legacy systems to the Web using simple green-screen commands. He has written several books, including E-Deployment: The Fastest Path to the Web, Eclipse: Step by Step, and WDSC: Step by Step. Joe performs onsite mentoring and speaks at user groups around the country. You can reach him at joepluta@plutabrothers.com.

 

MC Press books written by Joe Pluta available now on the MC Press Bookstore.

 

Developing Web 2.0 Applications with EGL for IBM i Developing Web 2.0 Applications with EGL for IBM i

Joe Pluta introduces you to EGL Rich UI and IBM’s Rational Developer for the IBM i platform.

List Price $39.95
Now On Sale
 
WDSC: Step by Step WDSC: Step by Step
Discover incredibly powerful WDSC with this easy-to-understand yet thorough introduction.

List Price $74.95
Now On Sale
 
Eclipse: Step by Step

Eclipse: Step by Step


Quickly get up to speed and productive using Eclipse.

List Price $59.00

Now On Sale
 
Read More >>
Last Updated on Thursday, 11 June 2009 17:09
 
RingerSoftware
** This thread discusses the Content article: Practical Array Processing: Dynamic Arrays0
rockym@ccgov.net
Chris, The problem is that he is only allocating the space for the number of elements - even though he defined the data structure to handle 1,000 elements. If you use the %sizeof he\'d allocate space for nstores # of 1,000 element arrays, not nstores # of elements. IOW - if you made the change you offered, for 30 nStores you\'d allocate 30,000 elements, rather than the 30 that is desired. Also, reading through the file twice seems counterproductive. How about using %REALLOC after each read, allowing the array to grow as you go? You could have it allocate memory every ten elements to reduce the # of times this is done (or 50 or whatever). To me, it\'s not good programming practice. To not deallocate memory because it doesn\'t fit his personal style (allocate with a BIF %ALLOC, dellocate with an opcode DEALLOC ptr) is counterproductive - on one hand you\'re presumably worried about memory space being used while not doing proper cleanup of memory is just silly. Somebody will look at that code and get confused on what the intended objective is and screw it up by something like putting in the %Sizeof as you suggest. You\'re better off just having a 1,000 element array than to play games such as this. It isn\'t straightforward and self-documenting.
RingerSoftware
Rocky, %Size like I suggested only gets the size of one element. To get the total array size, you would need to specify %Size(aStores:*ALL); Chris
rockym@ccgov.net
Chris, Thanks for the correction. After looking it up you are correct. However, I still think that this really isn\'t good programming practice for the other reasons stated.
RingerSoftware
Rocky, I agree. I\'d bump it (%realloc) by 50 or 100 elements at a time, in one read loop. In all actuality, there\'s really nothing wrong with just going with DIM(1000) and forgetting about it. I can allocate 16 Meg of RAM in RPG about 0.001 seconds, so a DIM(1000) of 30 bytes is *nothing* in terms of CPU usage and speed (and use %subarr or the "# of elements to search" parm). Joe was just demonstrating a technique. We get to chose where to apply it and not apply it. (Thanks Joe) Chris
J.Pluta
You\'re right of course, Chris. I was thinking about doing exactly what you said, and then something else popped up and I forgot completely. Thanks for pointing it out!!
J.Pluta
I\'ll give you the DEALLOC argument, although I thought I was pretty clear about the options. I still think there\'s a point to be made as to how you release memory, especially if you\'re in ACTGRP(*NEW), but that\'s a different discussion. I should have just put in the DEALLOC and then made my arguments. As to reading through the file twice, while I agree that in this trivial example it doesn\'t make as much sense, the technique is actually pretty powerful especially in SQL where you can get the number of records using a COUNT(*) and then read all the records in one fell swoop (which I hope to show in a subsequent tip). As to being potentially confusing, while simplicity is good complexity is sometimes required. If a technique is useful but non-intuitive, that\'s what comments are for. I\'m not a big believer in "self-documenting" code. That\'s not to say that I advocate writing purposely obtuse code; I like code that is concise and readable. But I don\'t worry about the code being difficult when needed; we are after all programmers, and if it was easy everyone could do it .
rockym@ccgov.net
[quote>But I don\'t worry about the code being difficult when needed; we are after all programmers, and if it was easy everyone could do it . [/quote> LOL - I often say "If it were easy, any ol\' schmuck could do it!" I realize that SQL can easily give record counts and what not. I\'m just trying to figure out what the real benefit is. As Chris mentioned you can allocate a huge amount of memory very quickly, so where do you see the benefit of going through this process rather than simply allocating the memory? In my mind, there has to be some fairly substantial gains in order to not be concerned with being difficult to read. Sometimes we can be penny wise and dollar foolish so to speak, such as multiplying the date by an obtuse number (10000.01) in order to move the year around. While functional it\'s not clear what the goal is, it takes a performance hit, and since it took advantage of a quirk it had forward compatability issues - ie free format RPG. Penny wise, dollar foolish. I know your code here is MUCH better than that, just illustrating how we can sometimes be "clever" and not be as productive as we had hoped. And while SQL does return the COUNT - it still requires reading the file through... granted it is usually done very quickly, but I would think that allocating memory is still faster. Could you give an example where this process would be beneficial?
aglauser
I\'m not sure that COUNT always reads the whole file, probably depends on selection criteria, available metadata and indices. As for where this is useful, any time that making a reasonable guess as to how many elements are \'more than enough\' is very difficult or impossible. I find that processing data coming in from trading partners is an common application area where this makes sense. It\'s difficult to predict when some change in one of the businesses involved may lead to a sudden spike in transactions. When writing code with reuseablity in mind, reducing arbitrary limits can increase the chance for that code to be used in unexpected ways later on.
efnkay
Interestingly the following two statements produce the same value for count(*)... SELECT COUNT(*) FROM FILENAME SELECT COUNT(*) FROM FILENAME FETCH FIRST 10 ROWS ONLY However this statement will just return the first 10 rows. SELECT * FROM FILENAME FETCH FIRST 10 ROWS ONLY
efnkay
You get it...??? The first two statements only produce one row in the result set, while the 3rd statement produces a row in the result set for each record in the table...
Please login to make comments.
User Rating: / 2
PoorBest 
   MC-STORE.COM