From: Richard Schoen To: All
For all who care, here is a gotcha regarding subsystem QINTER. I have encountered a situation where the users on the AS/400 will be working away and then, all of a sudden, the screens will lock up as if there is someone running a processor-intensive job that has taken over the CPU. When WRK-ACTJOB is run from the system console, the CPU utilization shows 99-100 percent, although none of the user jobs is actually using more than 1-2 percent of the CPU. Finally, 30 minutes and many user phone calls later, the system console comes back with a nice system message saying that subsystem QINTER has ended, and all of the user screens go blank. It also displays a message explaining why subsystem QINTER ended. It says job message queue for QINTER could not be extended. QINTER is then restarted and the system goes along its merry way.
The moral of the story is that you should end and restart your subsystems on a periodic basis during the week-in our case, at least once per day. If you have a lot of users and a lot of jobs running, and you don't reset your subsystems often enough, you may eventually run into this problem. Hopefully, this dose of preventive medicine will help someone out there.
From: Ernie Malaga To: Richard Schoen
The reason for this hangup with QINTER is not related to subsystems, but to the job's message queue size. When a job starts, the system allocates space for its job message queue; its initial size is indicated by system value QJOBMSGQSZ, and the maximum size it can have is in QJOBMSGQTL.
Neither system value accepts *NOMAX, and this is good because a job like QINTER can run continuously for months if the system operator doesn't end it or IPL the system. *NOMAX would make the system use up every byte of DASD long before that.
Unfortunately, you cannot change QINTER's job to prevent stuff from being logged into its job message queue because it's a (sub?)system job. On regular jobs, you could CHGJOB LOG(0 99 *NOLIST), but I just tried it on QINTER and it wouldn't let me.
It looks like the only solution is to end QINTER periodically and start it again.
From: Pete Hall To: Richard Schoen
I ran into a similar situation with a job message queue. Someone had changed the job description to LOGCLPGM(*YES), and the queue filled up rather quickly.
There was a horrendous amount of performance degradation, and then a message was issued to QSYSOPR, indicating that the job message queue had been reorganized, and advising that the program might need to take some action if the message was issued frequently. I believe this was repeated twice more, and then the job had to be terminated because the message queue could not be extended.
Obviously, it is not a good idea to set LOGCLPGM(*YES), but maybe there is something else happening specifically when a message queue becomes full. Maybe we could either get IBM to fix it, or figure out how to protect ourselves.
From: Richard Schoen To: All
I had a chance to check IBMLink today on the QINTER problem that I had described previously. It turns out that IBM's answer to this problem is as I thought. In order to clear the job message queue for a subsystem that is active, you have to end and restart the subsystem. Now that is quite a hole in such a sophisticated operating system as OS/400.
From: David Knittle To: Richard Schoen
In many cases, you can prolong the time between QINTER starts and stops, though you still must occasionally shut the subsystem down. What I have found is that many of the messages logged to the QINTER subsystem have to do with device errors, etc. We have used these messages to target problem areas, and after correction, found our QINTER subsystem job message queue to be relatively empty.