What's the biggest cause of performance problems on IBM i servers? Not even three of the major performance-tool vendors completely agree on the answers.
Despite its reputation for security and reliability, even the IBM i server platform isn't perfect. As workloads increase and the demand for 24x7 access grows, sooner or later every enterprise reaches a point where system performance seems to start declining. But when it comes to pinning down the reasons, there's no easy answer because there are so many potential causes.
Trails to Solutions Lead in Many Directions
"By far, the biggest cause of performance problems on IBM i servers we have seen in our 20 years of experience is poor and inefficient database access," notes Julie Dillon, vice-president of sales and marketing at MB Software & Consulting. "Record-set processing is used in many cases where record-level access is more efficient. Since SQL is easier to code, it ends up being used even in transaction processing, where native API calls are a better choice. A less-efficient database technique is used, using an analytical database access method for interactive data access."
She goes on to cite three more of her top four causes. "Another cause of performance issues involves excessive initiation and termination. The constant opening and closing of files, persistent loading and unloading of programs in and out of memory, continual starting and stopping of jobs/sessions/connections, repeated SQL requests—these kinds of overloads can bog down even the mightiest of systems. Also, the use of a dynamic connection when static and persistent connections would be better. Finally, we see clients who don't have their hardware resources (CPU, storage, memory, and network) aligned with application workload requirements."
Randy Watson, president of Midrange Performance Group, cites his top three culprits as "disk configuration and performance, poor I/O-related application design, and poor virtualization configuration."
"Performance management is often so focused on the system that it fails to look at root causes, which I suggest are indexing, workload balance, and data volumes," offers Paul Wade, regional lead of sales engineering for Europe and growth markets at Vision Solutions. "Volumes of data are increasing, and this puts significant pressure on indexing strategies and reporting. Old hardware is also a cause of worsening performance exacerbated by data growth, workload balance, or poor indexing. Similarly, poor workload balancing will put pressure on response times and batch run times."
Part of the problem is identifying real causes, Wade elaborates. "Performance problems will ultimately manifest themselves in pressures on CPU, storage, or memory capacity, but we need to be careful to separate workload stresses and real performance problems. For example, a Java process that is consuming large amounts of memory would affect response times, and this could be interpreted as a performance problem related to lack of memory, but adding more memory will not necessarily solve the problem. When I find that 17 percent of a production server is consumed by Save Files that no one knows anything about, and the storage team is struggling with disk spikes that cause ASP overflow, I don't recommend installing more disk immediately. Archive or remove those Save Files (if possible), and you might just solve that problem. Deleted records will often be cited on IBM i as root causes of performance problems as well, but it is really a factor of the above three areas. Without the data volumes, need for 24x7 processing or perhaps high indexing requirements, you would have the time to reorganize. Similarly, temporary storage problems occur based on other factors outside of a genuine programming error."
"By far, the biggest issue today in understanding performance is virtualization," counters Midrange Performance Group's Watson. "A lot of performance metrics are tracking the utilization of a virtual resource which had to be mapped back to a physical resource. You can have many different partitions using the same physical resource. Secondly, the virtual resource can change over time. For example, the CPU desired cores setting can change all the time. This has to be accounted for especially when keeping historical data. Thirdly, you need to monitor the owner of the physical resource (most often VIOS) to understand the virtual to physical mapping."
Keeping an Eye on Performance Data
It's easy to think that keeping a watch on performance data will alert system managers to problems in time to fix, or at least mitigate, them. However, not all enterprises tend to look at performance data the same way, and that has its effects.
"The knee-jerk reaction is to see a spike and try and find out what causes a resource spike, (CPU, I/O, memory, network)," observes MB Software & Consulting's Dillon. "However, what is most often overlooked is that the whole system is running at 80 percent or more capacity when the spike occurs. What we try to get the customer to understand is the real problem exists, not at the intermittent spikes—those inevitably will happen for other reasons and are secondary to the main issue—but to address the primary cause, which is the whole system is constantly running at almost full capacity (80 percent or more), and it shouldn't be. Buying more hardware will not fix this issue; it will just increase the resource availability, thus reducing the overall usage. All you've done is change a variable in your percentage equation. Your underlying problem is still not solved. By having the customer look at trend data, we can work to address the root cause and not just put a Band-Aid on the symptom, the usage spike."
"Large enterprise shops have real-time monitoring capabilities. Mid-tier companies rely more on periodic reporting. Smaller businesses tend to not focus on performance until there is an issue and then do their best to resolve it," Wade comments. "In general, however, we see a tendency towards periodic reporting. The processing overhead of collecting and analyzing performance data can be prohibitive and is very time-consuming without tools to highlight problems for you. Also, the basic setup of IBM i servers is so good that the need to run additional performance management tools is not so great today. Periodic reporting with the occasional deep-dive into a specific problem works best for most shops. With the rise of a more service-orientated model, this is a good thing because companies do not want to have performance experts on the payroll or pay for these services on a regular basis."
"I see more focus on data lifecycle management nowadays, and that's a good step towards getting data volumes under control and utilizing the myriad of storage options more effectively. This in turn relieves pressure on data volumes and improves response times," Wade concludes.
Finding a Solution for Your Environment
Below are products that help system managers improve application, database, and system performance, divided into four general groups. Application performance management solutions are tools that focus on improving execution of applications (as opposed to system performance). Database performance management solutions specialize in improving database and query efficiency. System performance monitoring solutions primarily zero in on use of system resources such as disk and alerting IT personnel to problems or taking automated initial steps to alleviate performance problems. System performance acceleration solutions are designed to circumvent interactive CPW limits placed on older system models.
Application Performance Management Solutions for IBM i
GiAPA gathers application performance data every 15 seconds and helps system managers see which applications are causing performance problems, as well as identify specific threads, programs, and statements that might be causing problems. In addition, GiAPA helps programmers discover how programs might be improved to run more efficiently.
JENNIFER is an application performance-management solution for Java and IBM WebSphere environments on IBM i and other platforms. Tailored for production environments, JENNIFER monitors network operations, database activity, system load and performance, and other application internal services. JENNIFER reports data via Ecclus, a PC-based user interface with 3D graphical displays.
Macro 4, a division of UNICOM Global
SUPERMON for Java is an application performance monitoring solution that supports WebSphere, WebLogic, JBoss, and Tomcat application server environments. It focuses on application execution efficiency and maintaining service levels for applications based on Java. The product includes tools that isolate applications taking the longest to execute transactions, a database correlation facility that correlates performance data between Oracle and DB2 databases, and summary reporting.
Database Performance Management Solutions for IBM i
HomeRun is a suite of tools that let system managers improve SQL performance, optimize DDS logical files and SQL indices, audit data access, and control resources used by queries.
System Performance Monitoring Solutions for IBM i
MessengerConsole monitors all message queues in single or networked servers, sends alerts of problems, and automates message-handling throughout a network of servers.
CCSS, a HelpSystems Company
QSystem Monitor is a multipurpose system-monitoring application that operates in real time and provides a graphical interface and graphical reports. QSystem Monitor can monitor systems, disks, jobs, networks, availability, and even critical business application data via SQL. Other features let system managers set thresholds for alerts to IT personnel.
A Software as a Service (SaaS) solution, iSeries Watchdog monitors IBM i servers remotely and provides key system health threshold alert definitions and customized messaging to predefined email groups in the event of problems. Areas analyzed include disk/DASD usage, system messages, tape status, object tracking, and system problem interrogation.
Advanced Automation Suite is a central-console product that includes a performance analyzer, a disk-space manager, and a spooled file manager. The suite helps users monitor and manage a wide variety of system activities, including SLA compliance, output and distribution queues, FTP activity, the security audit journal, and disk space.
Authority Swapper lets users change to a user profile with greater privileges for specific tasks. The product includes a facility for recording screens accessed and changes made during the profile swap for auditing purposes.
Disk Space Manager analyzes all disk resources, including the IFS and independent auxiliary storage pools (iASPs) to help resolve performance problems. A hierarchical viewer lets users view all data in customizable formats or a Windows-style drilldown display and access data sorting and filtering options.
Operations Center Suite is a multiplatform monitoring and performance solution for IBM i, AIX, Linux, UNIX, and Windows. The suite gives users and system managers a control point for managing system performance, job scheduling, disk space, system messages, and spooled files across multiple servers that support remote executing daemons.
Performance Analyzer provides real-time system-performance data for both past and present data, and presents it in a GUI or via selected mobile devices. A report wizard provides standard or user-customizable performance reports.
Snapshot TSC monitors IBM i servers and notifies users of performance problems or application issues. The product can track multiple servers from a single console, can follow server performance by business units and groups, tracks critical performance indicators according to user-definable rules, highlights network response times and delays, and sends alerts via email, mobile devices, and network pop-up windows.
Systems Operations Suite monitors IBM i messages, job queues, output queues, and devices for issues or threshold breaches. It also enables proactive monitoring of key business applications, FTP activity, the security audit journal, as well as automatic management of system events.
Robot/NETWORK oversees networks of servers or partitions from a central console. It offers centralized control of the Robot software running on IBM i partitions as well as performance monitoring, exception-based management, and integration for servers and events across any environment, with results visible from any Web browser or mobile device.
Robot/SPACE monitors ASPs, iASPs, libraries, IFS objects, active job-storage levels, and other system-storage attributes. It monitors temporary job storage, collects disk space usage statistics, predicts future disk storage needs, and performs over 20 disk space cleanup duties. It also increases system performance by monitoring flexible storage thresholds for active jobs, QTEMP files, and spooled files.
IBM's iDoctor for IBM i is a suite of real-time analyzers (including the PEX analyzer), designed for both novices and experts, to inspect all aspects of system performance. Tools include analyzers for job, thread, and task performance data; database waits, I/O activity, and CPU use; job-run average response times, I/O rates, and memory-pool use; and heap analysis for systems using the Java Virtual Machine (JVM).
Performance Explorer collects performance data to help analysts identify the source of problems that can't be identified by the Performance Tools for IBM i product or other utilities included with the operating system. The most detailed mode can trace performance activity for one or more specific jobs or tasks on a system.
Performance Tools for IBM i is a collection of utilities for viewing, analyzing, reporting, and graphing performance data gathered by Collection Services. Output options include displays, graphs, reports, and moving data to iSeries Navigator.
PM for Power Systems is a performance-analysis and capacity-planning software application for IBM i systems running i5/OS, AIX, or Linux. It provides summary-level information on a system's current and long-term utilization trends, provides interactive access to 24 months of historical performance data, and provides data on virtualization capabilities.
Systems Director Navigator for i5/OS Performance displays summarized information in multiple charts and graphs with drill-down capabilities. It helps system managers spot and diagnose performance problems quickly.
MB Software & Consulting
Workload Performance Series is an integrated suite that analyzes System i application-processing environments and optimizes application performance in IBM i environments. Suite functions include monitoring and analysis of system resources, source-code execution, historical data, trends, resource utilization, and statistical overviews. Modules include performance-tuning tools for queries, journal transactions, system workloads, disk activity, and spooled files.
Midrange Performance Group
EXPO is a graphical Web interface for higher-level executives and other users that displays important IBM i performance metrics such as memory, CPU, and disk use on demand. The GUI includes color-coded graphics and enables time-dependent views of statistics from as little as the last 15 minutes to as much as the previous year.
Performance Navigator is a graphical application that runs on a PC and carries out performance analysis of IBM i servers. It provides 30 different graphs and reports on various hardware and software aspects of system performance.
Power Navigator runs on Windows PCs to provide workload modeling for IBM i and other servers running AIX, HP-UX, Linux, and Oracle Solaris. The product lets users access days, months, or years of performance and capacity data stored on their systems and notifies users of problem increases in storage use.
CPUScope is a performance enhancer for IBM i that monitors CPU and I/O activity. It searches for jobs that are consuming excessive resources and either takes predesignated remedial actions or sends alert messages in response to problems. Users can vary product-action execution based on time of day, day of the week, or day of the month.
DASD-Plus is an automated disk-management utility that offers 27 disk-maintenance routines, with a special emphasis on DB2 performance. It analyzes disk usage based on multiple parameters, gathers data using the PEX analyzer of IBM's iDoctor tool, and runs disk-optimization routines at user-specified intervals.
DASD-Plus Chart prepares graphical comparisons of data gathered by DASD-Plus or other tools and presents them in user choice of pie and bar charts. The charts are configurable to show details at the library, directory, file, or object level. The solution also provides trend charts to support capacity planning.
SoftLanding Systems, a division of UNICOM Global
(SUPERMON products are also available via Macro 4, a division of UNICOM Global.)
SUPERMON is a disk analysis and management product for networked IBM i servers. It provides automated disk management; alerts IT personnel to critical situations; removes deleted record space, automated disk usage evaluations, analysis disk space growth by specific applications; and produces disk-usage reports on demand or on a designated time schedule. The product includes a PC-based GUI for data display.
SUPERMON for iSeries V100 provides a central console for monitoring system components and performance data. The product collects and stores data automatically for historical analysis, provides "drill-down" tools that help operators analyze performance problems, and offers guidance via an advisor function.
Tango/04 Computing Group
VISUAL Control Center is a suite of integrated products that help monitor and manage performance of up to 999 networked IBM i, AIX, Linux, and UNIX servers and report data to a PC interface. The solution helps detect abusive users and programs, monitor system and end-user activity in real time and historically, and tune LPARs and memory pools.
Vision Solutions' iSCORE freeware is a downloadable system capacity analysis utility that analyzes disk usage by various categories and provides an overall system score in a summary report. The utility evaluates performance of indexes, journals, queries, SAVF, and system values.
MIMIX Director is a multifaceted suite that offers 70 GUI-based reports on system resources and objects. In addition, its features help in automating system management tasks, tracking changes to physical and logical file dependencies, and assisting with system capacity planning. Specifically, in the areas of performance tuning, the suite automates disk optimization and enhances system performance, monitors and enforces management-designed usage rules, makes (and optionally carries out) recommendations for increasing performance, and maintains complete audit trails of all its activities.
System Performance Acceleration Solutions for IBM i
Fax*Star, a division of SEPE
Max400 tunes and maximizes interactive CPW performance of System i servers running OS/400 V4R1 through V7R1. It is designed for situations in which the interactive capacity of the machine may have been set at a suboptimal level as a function of the machine model.
Kisco Information Systems
GoFaster is a performance accelerator for older System i machine models that have artificial limitations on their interactive operations. It doesn't offer performance-analysis functions.