Tune AIX for Optimal Performance

Typography

Smaller Small Medium Big Bigger
Default Helvetica Segoe Georgia Times
Reading Mode

Lots of great tools are available to help you evaluate and manage performance.

When looking at performance issues, I tend to group them into the following categories:

CPU bound
Memory bound
Disk access
Network
Process monitoring

This article looks at some of the popular tools available to gather information. Based on that information, you can then implement performance-related changes if required.

Please note: When considering performance tuning, you are strongly advised to gather evidence of the performance loss or examine the area in which you want to make gains before making tuning changes.

Always note any changes you make so that you can back out if the change you made has caused more issues or has not improved performance. Also, be aware of the "little picture" syndrome; it's no good trying to improve, say, disk operations if you have a slow network. The users will notice little difference.

In this article, I will try to be generic when using the tools and thus not specific to certain applications; this helps broaden scope of the article. Let's look at the most common tools that are available to show or gather system performance-related information.

Nmon

Without a doubt (in my opinion), the best tool to gather or record system information is nmon. Nmon comes with additional tools for presentation. Particularly helpful is the spreadsheet converter, which presents the metrics data in graphical format on your PC using Excel or some other similar spreadsheet product. Download nmon here.

Nmon is typically run from AIX's cron scheduler and commonly runs through a 24-hour cycle. This approach allows the systems administrator to go back in time and view the graphical charts. Nmon will report on the following:

CPU
Memory
Network
File systems
Disks
Top-running processes
Paging spaces
LPARS
Summaries

An example of one of the many reports nmon generates is shown in Figure 1, nmon_cpu. This report shows the CPU usages and the applications/processes that were running during a specific time span. The interval for this demonstration represents the nightly batch processing window. We can see that, between 10:00 p.m. (22:00) and midnight (24:00), CPU usage peaked.

Figure 1: The nmon_cpu report shows CPU usage. (Click images to enlarge.)

To run nmon for a 24-hour cycle (well, nearly 24 hours), you need to know the format of the cycles. It is best to have each daily capture be just short of 24 hours in order to avoid overlap of performance information from one day to the next. The most common parameters are these:

-f means output to spreadsheet format
-t includes top-running processors
-s is interval in seconds
-c is the number of counts between intervals
-m is the destination directory for the output file, which would be in the format <hostname<date>.nmon

To run for nearly 24 hours (in this demonstration, it's 23.75 hours), I could use the following:

-s would be 900 (900 seconds = 15 minutes)
-c would be 95 counts

Using the above numbers, we come up with 23.75 hours, like so:

15 mins * 95 / 60 = 23.75 hours

So, the following crontab entry would generate a daily 24-hour system report. Cron will execute this every day at 9:00 a.m. (09:00).

00 09 * * * /usr/bin/nmon -f -t -s 900 -c 95 -m /opt/dump/nmon

Topas

The tool that will generally be the first port of call for any system admins when experiencing performance issues is topas. Though topas can record and graph, it's more commonly used in real-time mode to see what's happening on the system at a particular point in time. Topas will show everything you need to know to make a quick decision on where the impact is. To navigate through the different real-time reports, be sure to read the man page. Most operations are carried out using keystrokes from the keyboard. Topas will report on these issues:

Memory
Paging space
NFS
CPU
Network
LPARS
Workloads
Disks
Processors using resources

Figure 2 shows a topas screenshot. In this example, it's displaying in real time the top processes that are using the paging space. Displaying this screen is accomplished by hitting : P on the keyboard and then using the arrow keys to move to the PAGESPACE column. To highlight top CPU processors, use the arrow key to move to the CPU% column.

Figure 2: In this screenshot, topas shows the processes that are using the paging space.

Sar

The system activity reporter sar command is the stable diet of system admins in gathering command-line-readable information about the system. It's typically run from cron but is occasionally run directly from the command line. Sar will, by default, report on user activity, percentage of CPU activity, the time processors were idle, and the number of CPUs used, though sar can report on many different system activities. The following sar example will generate statistics three times running at five-second intervals:

# sar 5 3

AIX rs6000 1 7 00C23BED4C00 11/17/11

System configuration: lcpu=4 ent=1.00 mode=Uncapped

13:21:35 %usr %sys %wio %idle physc %entc

13:21:40 30 59 0 11 1.24 124.3

13:21:45 24 64 1 11 1.09 108.9

13:21:50 36 53 1 10 1.63 163.0

Average 31 58 0 11 1.32 132.0

Check the Loads

One of the oldest and most commonly used commands is the uptime command. It's a favorite because its output is used in many scripts to report on load averages. It's commonly used in alert email scripts that are run on the system and are sent when load averages hit a certain level, like 60 or 70, to forewarn system admins of a potential issue. The output shown below produces the total time the system has been up in a day and the number of users connected. The load average is displayed in intervals of 1, 5, and 15 minutes:

$ uptime
03:37PM up 179 days, 4:38, 27 users, load average: 3.34, 2.39, 2.26

Mpstat

Mpstat will report on all CPU devices, which is particularly useful when investigating CPU bottlenecks. In the following output, mpstat will run once at a 5-second interval:

# mpstat 5 1

System configuration: lcpu=4 ent=1.0 mode=Uncapped

cpu min maj mpc int cs ics rq mig lpa sysc us sy wt id pc %ec lcs

0 4940 0 1 632 685 268 0 320 100 263924 42 55 0 4 0.57 35.1 277

1 990 0 3 1387 2234 805 0 684 100 130290 28 47 0 25 0.27 16.6 649

2 3943 0 2 531 663 223 0 389 100 276520 44 54 0 3 0.57 34.9 270

3 1298 0 2 1856 2742 846 0 752 100 82141 31 40 0 29 0.22 13.4 650

ALL 11171 0 8 4406 6324 2142 0 2145 100 752875 39 51 0 10 1.63 163.1 1846

Of particular interest is the column maj; this is the total major page faults (i.e., when AIX tries to get a page that is not resident in real memory). In the above output, there is none, so all looks good. A quick way to see how busy your CPUs are is to use this command:

# mpstat -s

System configuration: lcpu=4 ent=1.0 mode=Uncapped

   Proc0 Proc2
39.20%   38.60%
cpu0    cpu1    cpu2 cpu3
20.54% 18.66% 20.00% 18.61%

In the above output, we can see that we have two physical CPUs split into two logical CPUs. The percentages of the spread show how busy they are.

Svmon

To look at memory consumption, svmon is the command line tool for you. Svmon will report on general memory usage across the system. It can be run at intervals to gather information over a period. AIX differs from other UNIX operating systems in that it will use all the available memory from the start, so if you see the system using nearly all the memory, this is normal. To display the global memory spread that is in use on a system, with the usage in GB, use this code:

# svmon -G -O unit=GB

Unit: GB

-------------------------------------------------------------------------------

size inuse free pin virtual available

memory 8.50 8.48 0.02 2.14 13.9 1.04

pg space 17.1 7.24

work pers clnt other

pin 1.85 0 0 0.29

in use 6.94 0 1.54

The above output reports on size of real memory, total memory currently in use, free memory, memory that is pinned (that is, it will stay in memory and not be swapped out to paging space), and virtual memory. The report also displays information on workloads and persistent and client memory.

To find out how much memory user ukinst2 is using, use this code:

# svmon -U | head -n 3; svmon -U -O unit=GB |grep ukinst2

User Inuse Pin Pgsp Virtual

ukinst2 1.16 0.03 0.03 0.50

In the above output, we can determine that user ukinst2 is using 1.16 GB of real memory, 30MB is used by the paging space, 30MB is pinned in memory, and 50MB is currently in virtual memory.

You can also use svmon -U to produce a list of all users and their respective memory use.

There is also the vmstat command, which provides reports similar to those of svmon but without as much detail.

Lvmstat

If the system is suffering from extreme disk activity, you need to identify the source. The lvmstat command will provide information on hotspots contained in your Volume Groups, or Logical Volumes. First, enable logging using lvmstat on the required Volume Group—in this example, appsvg:

# lvmstat -v appsvg -e

Next, you could run a report on all Logical Volumes contained in that Volume Group, running at 2-second intervals, producing five reports:

# lvmstat -v appsvg 2 5