Practical RPG: Handling Abnormal Termination in Servers

  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

One of the hard parts of any server-based architecture is knowing when your server has been canceled; this article shows you how to address that.


I saw an email message in the mailing lists the other day about the old standby for termination, the scope message. Scope messages are cool; using the QMHSNDSM API, you can identify a program to call when the job or call stack entry ends, and I'll address that another time. But I really like the CEERTX API. CEERTX is the ILE RPG version of the concept and allows you to identify a procedure to be called if your call stack entry is terminated.

When Do You Need This?

This problem comes up all the time in any sort of multi-tiered architecture. Let's say I have a program running on my PC (or tablet or phone or…) that sends a request to the IBM i, and that server program encounters an error bad enough to cause a hard halt. I know, your programs never get hard halts, but mine do occasionally. Anyway, here's the problem: the operator cancels the server job. Now what? How do we let the requesting program know that an error has occurred? Well, there are several options, most of them involving a timeout. Essentially, the design is that when the application senses a timeout, it tells the user and the operation aborts. But there's a fundamental issue with a timeout: how long do you wait?


If the timeout is too short, you can get false timeouts. In the case above, let's say the problem was an object lock from an unexpected save operation. OK, as soon as the save ends, the operator hits R for retry and the server continues. However, if the client-side application has already timed out, then there's no place for the response to go and the retry is for naught. So, make the timeout really long, right? The problem with that approach is that the longer the timeout, the better the chance that the user will get frustrated and terminate the client when a real timeout occurs.


So no matter how you tune it, the timeout is never the perfect solution. The better solution is always a positive termination message to the user: "Server canceled, please retry and contact support if this message appears again."

Getting Positive Negative Feedback

You may have noticed that I used the phrase "positive termination message." That's a little oxymoronic, but all it means is that the error condition is identified by a concrete error response, not simply the lack of any response as is the case in a timeout. Clearly, being able to send a message to the user as soon as the server is canceled would be perfect: it would provide an immediate response to the user, who could either retry the operation or contact support if the error is persistent. For example, if the client is waiting for a message from a data queue, it would be nice if an abnormal termination of the server sent a message to that queue to notify the client. Unfortunately, in a typical operating system, when you cancel a program, it's done; it doesn't get a chance to tell anyone. However, the IBM i is not a typical platform, and it makes some powerful APIs available. QMHSNDPM and CEERTX are perfect examples of that power. Allow me to present an example of CEERTX.


The program is simple. If you call this program, it waits on the data queue MCPQ in library MCP. This is a 32-character data queue created with the command CRTDTAQ DTAQ(MCP/MCPQ) MAXLEN(32). The program waits for 60 seconds. If it receives a value from the queue, it sends it to QSYSOPR. If it times out, it sends "Timeout" to QSYSOPR. And that would be the normal way of things. However, by making use of CEERTX, I am able to provide a third option, Canceled, which occurs if the job or the call stack entry is canceled. I'll walk you through the code.


     h dftactgrp(*no) actgrp(*new) option(*srcstmt:*nodebugio)


     d SendMessage     pr

     d   Message                     60    const


     d Cancel          pr

     d   TokenIn                       *



First, there's my standard H-spec, as well as the required internal prototypes for my pre-6.1 friends (if you're at V6.1 or later, you don't need internal prototypes—hallelujah!).



      * External prototypes



     dCEERTX           pr                  extproc('CEERTX')

     d   pCanclHdlr                    *    const procptr

     d   errInfo                       *    const options(*omit)

     d   FeedBack                    12     options(*omit)


     dQRCVDTAQ         PR                  EXTPGM('QRCVDTAQ')

     d   DQName                      10    const

     d   DQLibr                      10    const

     d   DQLen                        5p 0

     d   DQData                      32

     d   DQWait                       5p 0 const


     d DQLen           s              5p 0

     d DQData          s             32


     dQCMDEXC          PR                  EXTPGM('QCMDEXC')

     d   Command                     80    const

     d   CmdLen                      15p 5 const



Next, my external prototypes. These are for the CEERTX API (which I'll discuss in a moment), the QRCVDTAQ API used to receive a message from a data queue, and the ubiquitous QCMDEXC prototype. Notice that QRCVDTAQ and QCMDEXC use the EXTPGM API; that's because they're programs, not procedures. They're also not the focus of this article, so that's all we'll say about them here. CEERTX on the other hand is a procedure exported from the QLEAWI service program and magically bound into the program at compile time. The most important parameter (and in fact the only required parameter) is a pointer to the procedure to be called when the call stack entry is terminated abnormally (that is, by anything other than a return). The procedure needs a specific signature, and we'll see that momentarily. But first, let's take a look at the program.





       QRCVDTAQ( 'MCPQ': 'MCP': DQLen: DQData: 60);

       if DQLen = 0;



         SendMessage('Received: ' + DQData);


       *inlr = *on;



 There's not a lot there. First, we register the cancel handler by passing the address of the CANCEL procedure. Please remember that, in RPG, even if your source code has the procedure name in mixed case, as far as the compiler is concerned, the name is uppercase, so be sure to use uppercase when specifying the procedure within the %paddr expression. The other parameters are omitted: the first *OMIT indicates that nothing is passed to the procedure, and the second *OMIT says that if you code the call incorrectly you'll get an escape message. After that, the logic is pretty simple: call QRCVDTAQ, waiting for 60 seconds. If no data is returned (DQLen = 0), send a timeout message; otherwise, send the data received.


     p SendMessage     b

     d                 pi

     d   Message                     60    const


       QCMDEXC('SNDMSG MSG(''' + %trim(Message) + ''') TOUSR(*SYSOPR)': 80);


     p                 e



This is the message routine. It's trivial: just send the message to QSYSOPR using SNDMSG and the QCMDEXC API. If this were a real server program, we would instead send the appropriate message back to the client application.


     p Cancel          b

     d                 pi

     d   TokenIn                       *

     d x               s              3u 0




     p                 e


And this is the magic. Cancel is called if the program is terminated in any way. Note that the procedure interface defines a single pointer parameter. If you had specified a pointer as the second parameter (rather than *OMIT as shown in the example), that pointer would appear here. It's a nifty way to allow a single procedure to handle lots of cancel conditions, but that's a little outside the scope of today's lesson. Instead, all the procedure does is send a Canceled message to QSYSOPR. Remember that this is a trivial example; a real handler would send a predefined termination message to the client application.

Making It Work

Once you've got the program compiled, you can test it. You could call the program and then wait 60 seconds; the program will end, and you'll get a "Timeout" message on the system operator message queue. Or you could submit the call and then send a message using QSNDDTAQ: CALL QSNDDTAQ (MCPQ MCP x'00032F' 'OK!'). You'll see "Received: OK!" in the message queue instead.


However, if you kill the submitted job, even using ENDJOB *IMMED, you'll see the message "Canceled" in the system operator message queue. And that's the point of this article: you can cancel the job in any way and you'll still be notified! Even cooler: call the program from the command line and then cancel the program using System Request / 2. You'll still get the canceled message!


So as you continue your trip into the world of multi-tiered applications, keep this API in your tool belt. I guarantee it will make handling server errors a whole lot easier.