Detailed security, job, and performance monitors provide granular reporting designed to further enhance system availability.
CCSS today announced the release of 12 new data definitions for QSystem Monitor (QSM), its leading performance monitoring and reporting solution. The new data definitions provide QSM users with the opportunity for greater insight into potential problems on the system without carrying out a protracted investigative process. The new monitors are designed to save users valuable time in their daily tasks and reduce risks to system availability.
Monitoring the communication status of an IP interface has been a longstanding feature of QSM, which users have deployed to support their availability monitoring protocols. The method relied on TCP ping technology to return either an active or inactive status at intervals determined by the user. As part of the new data definitions, CCSS has created a unique alternative means for users to check the status of an IP address that bypasses traditional TCP protocols. This new monitor type returns the current status of a single interface and also supports commands that will allow the user to start and end the interface. The status is returned as a number which is translated to the status using the threshold. It is ideal for environments in the gaming industry where sensitivities to security via TCP are especially rigorous.
Many of the new data definitions address the considerations of job monitoring. As IT managers know, important batch jobs can be delayed in their processing if they are put on hold, when the queue they are waiting in is put on hold, or if they are in a queue that is not attached to a subsystem. While the ability to hold jobs in this way is both useful and necessary, this can cause problems for the operator if they are unaware that a hold status exists at the job, queue, or subsystem level. This may be particularly true when scheduling large jobs during off peak hours. The following four new data definitions tackle this issue by offering immediate visibility of the following job status profiles:
- Number of batch jobs that are currently held on job queues
- Number of jobs that are currently waiting on held job queues\
- Number of batch jobs on job queues that are not attached to an active subsystem
- Total number of batch jobs waiting to run, both on job queues and scheduled
Three new data definitions have been created to show the number of batch, interactive, and total number of jobs that have finished but still have printed output on output queues.
Paul Ratchford, CCSS product manager, explains the benefits of these new definitions for users: “Having immediate access to this type of information is very useful as it conforms to the principle of a ‘management-by-exception’ environment. In this case, you don’t need to know when new output queues are created, or monitor every single one on the system because it will show up in these totals, and anomalies will be immediately obvious. Previously, if you added new applications that generated output queues, you may not have even known about them until there was a problem. These new parameters offer users a much more granular view,” he says.
New data definitions now show users the total number of jobs in the system and total number of jobs in the system as a percentage of the configured maximum. These figures include all job types including subsystem monitors and system jobs. This provides an extra level of security for users as they now have an at-a-glance reference of these figures through QSM’s Online Monitor. In an extreme example of looping jobs, or if a job submits another job where the spool files are small—and therefore CPU not great—these could be detected in the total job count very easily. If a system is approaching the maximum number of jobs allowed without knowing it, operators could face a delay in important processing as the system blocks them from starting or submitting new jobs, or possibly even a system failure. Thresholds and associated alerts can be attached to this total number, or percentage view, to ensure this number is not exceeded.
Each time a permanent or temporary object is created on the system, it uses either a unique permanent or temporary address. Each system has a finite (albeit very large) number of permanent and temporary addresses, and if this is exceeded, i.e. if it reaches 100 percent, the system will abnormally terminate and require a scratch installation from backup tapes. Again, in a looping scenario, huge amounts of CPU or I/O may not necessarily be generated, but the number of permanent or temporary addresses could be increasing rapidly. To eliminate this particular threat to the system, two new parameters can be utilized to return either the percentage number of used permanent or used temporary addresses.