Last week I ran a job that called CL program A, which started REXX procedure B. This REXX procedure repeatedly called CL program C. After running several hours, my job aborted because the job message queue had exceeded its maximum allowed size (which is determined by system value QJOBMSGQTL).
The job log contained thousands upon thousands of CALL PGM(C) statements, one after another. After much head-scratching, I tried recompiling both CL programs with LOG(*NO) and resubmitted the job. Several hours later, it aborted again. Then I tried changing the job to LOG(0 99 *NOLIST) so that no job log was produced-but the result was still the same. This time it was worse because I didn't have a job log, so there was no clue as to what went wrong.
I really didn't want to change QJOBMSGQTL unless *NOMAX was acceptable, but it isn't. The maximum size is 32767 KB, which is over 32 MB. I figured that if I changed it my system might crash when it ran out of disk.
Fortunately, the solution was rather simple. I included the Remove Message (RMVMSG) command in the REXX procedure, so that it would be executed every so often:
RMVMSG PGMQ(*SAME QREXX) + CLEAR(*ALL)
This command removes all messages from program message queue QREXX (which is what the REXX procedure uses). I resubmitted the job and this time it ran fine. My system never used up large amounts of space for the job message queue (probably it never grew beyond 1024 bytes). An interesting side-effect that I noticed while the program was running in batch is that the job log kept growing and shrinking. It grew with each CALL PGM(C), only to shrink back when it hit the RMVMSG statement.