Its the end of the week, and, just as youre getting things cleaned up in preparation for the weekend, a coworker comes into your office and explains that she must have immediate access to a file that was saved to tape and deleted. After looking through copious save logs, you identify the tape containing the file, travel to the building containing the off-site backups, retrieve the desired tape, return to your machine room, and, finally, restore the file. Congratulations! Youve just experienced the most basic example of hierarchical storage management (HSM).
In this article, I will discuss the concept of HSM and some of the available commercial products that implement it. Ill also be discussing how you can use the tools that are available with OS/400 to implement your own HSM solution, including a way for your programs to access data stored offline without any modifications.
Storage on any computer comes in various layers. The fastest storagein the registers of the CPU itselfis extremely expensive on a per-byte basis. The next step down in speed is cache memory, followed by RAM, fast hard disks, compressed hard disks, and, last but not least, removable forms of storage, such as optical, tape, and floppy disks. The price per byte of storage decreases as the access time increases. Thus, there is a huge difference between the cost of storing one byte in a CPU register and the cost of storing one byte on a tape, just as there is a huge difference in the time it takes to access those bytes.
In a perfect world, every computer would contain enough RAM to house every datum and program that would ever be needed. The speed of this fantasy computer would be incredible. In an imperfect world, the computer would have only enough disk and memory to process one job at a time. Such was the situation in the early days of computing when programs and data were stored on punched cards or the first floppy disks. Todays IT manager is in charge of one or more computers that fall somewhere between these two extremes. For the remainder of this discussion, Ill assume that the AS/400s youre responsible for have a sufficient quantity of memory to address any performance problem due to a lack of memory.
Hierarchical storage management has been part of managing a computer system since the very first computer was brought online. The premise is quite simple: Maximize
system transaction throughput by storing the most frequently required data on the fastest (most expensive) devices, while storing less frequently accessed data on the slowest (cheapest) devices. The quantity of storage media is based, as always, on budgetary constraints and access requirements.
Planning for maximum transaction throughput includes not only the memory and disk requirements for the computer system but also the time of the animated overhead (the people) that service or use the computer.
Whats That Data Worth?
The reality-based anecdote that appeared at the beginning of this article demonstrated a large time-based cost for the retrieval of a file. Another way to look at this is from the viewpoint of a datums value. An open accounts-receivable entry is typically more valuable than a closed accounts-receivable entry, so Im willing to store it on pricier disk real estate. As that entry moves from open accounts receivable to closed accounts receivable, from current month to previous month, from current quarter to previous quarter, from current fiscal year to previous fiscal year, and from pre-accounting audit to post-accounting audit, its value to the corporation decreases. As the immediacy of the data decreases, so does its value, which means it can be staged through its life cycle to increasingly slower and cheaper devices. HSM is all about deciding when the data should be devalued and ensuring that it remains at the appropriate level of accessibility throughout its life.
There are many options available for staging the data on a large AS/400 configuration. One of these typically would have the requisite system auxiliary storage pool (ASP), plus a number of user ASPs, and possibly a tape or optical library.
As an example, the aforementioned open accounts-receivable entry would be created on the system ASP, presumably on the fastest drives available. The entry would remain there until the account is paid. Once the debt is satisfied and the entry only of historical value, it could then be moved to a user ASP, perhaps on a compressed drive where access speed is traded for denser storage capabilities. As time progresses, the entry might migrate to a tape in a tape library or onto an optical medium in an optical library. At some point, it would find its way into the corporate archives or into the bit bucket.
IBM has several software products that will help you create an HSM solution, such as ADSTAR Distributed Storage Manager (ADSM) for the AS/400, a client/server solution that permits mixed clients to store data on one or more servers. For instance, one AS/400 could serve as a repository for backups of PC clients and other AS/400s. ADSM client software is available for all the popular PC operating systems, including the Windows variants, OS/2, and Linux. ADSM can be set up so that clients send their backup data to the AS/400 at their convenience, or they can be polled by the AS/400 when it is ready to accept their data. Once the data has made it to the AS/400, there are a number of options available, such as how many versions to save, how long to keep each version, and when to move the data to tape.
Another IBM software product that adds to the HSM toolkit is Backup Recovery and Media Services (BRMS) for AS/400. This program is a sophisticated tape librarian capable of automatically retrieving archived objectsthat is, objects saved in a tape library with their storage freed.
Of course, there are many more storage and retrieval vendors out there besides IBM, each with its own unique products. For example, LXI Corp. has a product called Media Management System (MMS), which is a tape management storage and retrieval system for the AS/400. It also has a relatively newer product, Tape Management System (TMS/ix), which runs across both AS/400 and UNIX boxes simultaneously. There are many other vendors who provide similar services and products. Check the Online Yellow Pages on the Midrange Computing Web site at
www.midrangecomputing.com/yellowpages for more vendors providing competing products and services.
Build Your Own Solution
Even if you dont have an AS/400 large enough to have tape libraries and user ASPs or a budget large enough to purchase all of the software mentioned, you can still build your own HSM solution. Dont believe it? Then consider that the AS/400 already implements HSM. Let me give you an example. We, as programmers, do not need to concern ourselves with where OS/400 is storing our objects across the various storage layers. When we call a program, regardless of where it is located on the various storage layers, it gets transparently loaded into real memory and is then executed. The first call loads the program from disk into memory.
Subsequent calls may find the program already available in main memory or, perhaps, still in cache. We, as programmers, need only intercede when a requested object no longer exists online.
IBM has provided APIs to make the creation of a custom-built HSM solution possible. The APIs are Move Library to ASP (QHSMMOVL), Move Root Folder to ASP (QHSMMOVF), Move Spooled File (QSPMOVSP), and Save *STMF with Storage Free (Qp0lSaveStgFree).
The first two APIs, QHSMMOVL and QHSMMOVF, do exactly what their titles suggest: They move either a library or a root folder, respectively, from one auxiliary storage pool into another. Before performing the actual move, they ensure that sufficient space exists in the target ASP to permit a successful transfer. Neither of these APIs, however, are useful if there are no user ASPs available on your AS/400.
The third API, QSPMOVSP, moves spool files from one output queue to either the top of another output queue or after another spool file in the same queue. Should the target queue exist in a different ASP than that of the source, a test for adequate space is performed. The move is accomplished only if it passes the test, just as the first two APIs do. There are some restrictions to the use of the various APIs. It is not possible to move system libraries (those beginning with the letter Q) or libraries that contain objects that are not permitted in a user ASP or objects that cant be renamed. The QSYSWRK subsystem must be active, since it handles migration jobs, and the objects to be moved may not be allocated by another job. In short, any limitation that normally applies to the location and use of an object is still enforced even when using one of these APIs.
Some of the functionality of these HSM APIs already can be found in CL commands. For example, the functionality of the QSPMOVSP API is available in the Change Spool File Attributes (CHGSPLFA) command. Figure 1 is a program that demonstrates the use of the QSPMOVSP API to move a spool file from one queue to another. You could just as easily use the CHGSPLFA command to perform the same task. But the APIs provide more flexibility since they can be called directly from a high-level language. The ability to save a stream file object with freed storage is not available in the SAV command as it is with the Qp0lSaveStgFree API, so, in this case, the API has the clear advantage.
Weighing Your Options
If you plan to work with these or any of the other OS/400 APIs, be sure to load licensed product 5769SS1 Option 13 of OS/400: System Openness Includes. You can find this on the OS/400 installation media and load it through the licensed program menu (GO LICPGM). Option 13 loads the library QSYSINC onto the AS/400. This library includes source files containing members you can copy into your programs. In turn, these source files contain predefined data structures that make accessing the APIs much easier. For example, member QUSEC in file QSYSINC/QRPGLESRC provides the ILE/RPG common data structure for the error code parameter used by many of the APIs. The
comments included in the members provides some good information about the APIs, so theyre worth studying.
Depending upon your current AS/400 configuration, your reaction to these APIs could fall anywhere between boredom and excitement. If you have one or more user ASPs, you can write software that takes advantage of the APIs by moving static information from the system ASP to a user ASP. This user ASP would need to be backed up only whenever additional information is added to it. Thus, you can easily improve your backup times with just a little creativity.
Even if you find yourself uninspired by the APIs mentioned in this article, there exists one exciting HSM opportunity available for AS/400s of all sizes. To take advantage of it, you need to load 5769SS1 Option 18 of OS/400: Media and Storage Extensions (MSE). This option provides a user exit program and OS/400 exit points that permit a user program to be called whenever an archived version of an object needs to be restored to the system to satisfy a system request.
An object is considered to be archived when you have saved it to an offline medium, such as tape, and have specified that the storage for the original object should be freed. This causes the object to be saved and then the contents of it to be deleted from the system. The description of the object, including the save information (such as the device type and volume ID of the save medium) remains on the system. Without MSE, whenever access to an archived object is attempted, OS/400 returns a variety of error messages to the caller that all amount to the same thing: The object is inaccessible. With MSE and one of your own programs registered to an MSE exit point, you can intercede. The program can then do something as simple as prompting the user to insert the appropriate tape, restore the file, and exit. OS/400 then attempts to access the object and finds it completely intact. Meanwhile, the program that attempted to access the archived object will never know what happened. Instead of having to code special logic into each program to ensure object accessibility, the function can be pushed down to the operating system level. This is akin to using database triggers to enforce business rules.
Option 18: Media and Storage Extensions is not free. It costs $995 regardless of machine size. It is provided on the OS/400 installation media and can be installed through the licensed program menu (GO LICPGM). As with all IBM V4R4 software, you can try it before you buy it for 70 days, so you do have some time to play before you pay.
To really take advantage of HSM, you need to design it into your applications. The APIs discussed in this article work on entire libraries or root directory trees. Saving objects, such as files, causes them to be saved as a complete entity. So, in both cases, you may not have the granularity you need for HSM. By redesigning your systems and applications so that you can split things into more logical units, perhaps by their relative value, youll be able to keep your system throughput and human throughput optimized.
References and Related Materials
Complementing AS/400 Storage Management Using Hierarchical Storage Management APIs, Redbook (SG24-4450-01)
* TO COMPILE:
* CRTBNDRPG PGM(xxx/MOVSPLFR) SRCFILE(xxx/QRPGLESRC)
** MOVSPLFR - This program demonstrates the QSPMOVSP and QHSNDPM APIs
* References: System API Reference Manual SC41-5801
* file: QSYSINC/QRPGLESRC,QUSEC
* file: QSYSINC/QRPGLESRC,QSPMOVSP
** MODIFICATION HISTORY
** 2000-04-24 Initial incarnation. Barry L. Kline
** Fields used as parameters to this program
D srcJobName s like( qspsjn01 )
D srcJobUser s like( qspsjun01 )
D srcJobNumber s like( qspsjnbr01 )
D srcSplFile s like( qspssfn )
D srcSplFileNbr s 5p 0
D targetQueue s like( qsptoqn )
D targetQueueLib s like( qsptoql )
* Fields used as QMHSNDPM parameters
D callStack s 10 inz('*')
D msgData s 100
D msgFname s 20 inz('QCPFMSG *LIBL ')
D msgId s 7
D msgKey s 4
D msgLen s 8b 0
D msgType s 10 inz('*STATUS')
D stackCounter s 8b 0 inz(2)
* Fields used as QSPMOVSP parameters
D fmtName s 8a
D fmtLength s 8b 0
* This data structure is for the error code parameter use in both APIs
* Data structure modified from QSYSINC/QRPGLESRC,QUSEC
D qusbprv 1 4B 0
D qusbavl 5 8B 0
D qusei 9 15
D quserved 16 16
D qused01 100
* Copy the API headers provided for QSPMOVSP. There are two
* different data structures provided in the /copy member which
* represent the two different functions available in QSPMOVSP.
* We're going to use QSPF0100 to move the file to a new output queue.
** Program entry.
*C *entry plist
C parm srcJobName
C parm srcJobUser
C parm srcJobNumber
C parm srcSplFile
C parm srcSplFileNbr
C parm targetQueue
C parm targetQueueLib
* Fill in QSPF0100 fields
C eval QSPSJN01 = srcJobName
C eval QSPSJUN01 = srcJobUser
C eval QSPSJNBR01 = srcJobNumber
C clear QSPSIJID01
C clear QSPISFID
C eval QSPSSFN = srcSplFile
C eval QSPSSFN00 = srcSplFileNbr
C eval QSPTOQN = targetQueue
C eval QSPTOQL = targetQueueLib
* Fill in the fields required for the call to QSPMOVSP
C eval fmtName = 'MSPF0100'
C eval fmtLength = %size(QSPF0100)
* Move the spool file.
C call 'QSPMOVSP'
C parm qspf0100
C parm fmtLength
C parm fmtName
C parm qusec
* If any errors occurred, pass them back to the caller of this program
C if qusei <> *blanks
C call 'QMHSNDPM'
C parm qusei msgId
C parm msgFname
C parm qused01 msgData
C parm qusbavl msgLen
C parm '*DIAG' msgType
C parm callStack
C parm stackCounter
C parm msgKey
C parm qusec
C eval *Inlr = *on
Figure 1: Heres an example of using the HSM APIs to move a spool file.