Sunday, December 19, 2010

SAR Command

The sar command collect, report, or save UNIX / Linux system activity information. It will save selected counters in the operating system to the /var/log/sa/ directory . From the collected data, you get lots of information about your server:

1. CPU utilization
2. Memory paging and its utilization
3. Network I/O, and transfer statistics
4. Process creation activity
5. All block devices activity
6. Interrupts/sec etc.

sar output can be used for identifying server bottlenecks. However, analyzing information provided by sar can be difficult, so use kSar, which can take sar output and plot a nice easy to understand graph over period of time

User Level: Intermediate
Sadc (system activity data collector) is the program that gathers performance data. It pulls its data out of the virtual /proc filesystem, then it saves the data in a file (one per day) named /var/log/sa/saDD where DD is the day of the month.

Two shell scripts from the sysstat package control how the data collector is run. The first script, sa1, controls how often data is collected, while sa2 creates summary reports (one per day) in /var/log/sa/sarDD. Both scripts are run from cron. In the default configuration, data is collected every 10 minutes and summarized just before midnight.

If you suspect a performance problem with a particular program, you can use sadc to collect data on a particular process (with the -x argument), or its children (-X), but you will need to set up a custom script using those flags.

As Dr. Heisenberg showed, the act of measuring something changes it. Any tool that collects performance data has some overall negative impact on system performance, but with sar, the impact seems to be minimal. I ran a test with the sa1 cron job set to gather data every minute (on a server that was not busy) and it didn't cause any serious issues. That may not hold true on a busy system.

Creating reports

If the daily summary reports created by the sa2 script are not enough, you can create your own custom reports using sar. The sar program reads data from the current daily data file unless you specify otherwise. To have sar read a particular data file, use the -f /var/log/sa/saDD option. You can select multiple files by using multiple -f options. Since many of sar's reports are lengthy, you may want to pipe the output to a file.

To create a basic report showing CPU usage and I/O wait time percentage, use sar with no flags. It produces a report similar to this:

01:10:00 PM CPU %user %nice %system %iowait %idle
01:20:00 PM all 7.78 0.00 3.34 20.94 67.94
01:30:00 PM all 0.75 0.00 0.46 1.71 97.08
01:40:00 PM all 0.65 0.00 0.48 1.63 97.23
01:50:00 PM all 0.96 0.00 0.74 2.10 96.19
02:00:00 PM all 0.58 0.00 0.54 1.87 97.01
02:10:00 PM all 0.80 0.00 0.60 1.27 97.33
02:20:01 PM all 0.52 0.00 0.37 1.17 97.94
02:30:00 PM all 0.49 0.00 0.27 1.18 98.06
Average: all 1.85 0.00 0.44 2.56 95.14

If the %idle is near zero, your CPU is overloaded. If the %iowait is large, your disks are overloaded.

To check the kernel's paging performance, use sar -B, which will produce a report similar to this:

11:00:00 AM pgpgin/s pgpgout/s fault/s majflt/s
11:10:00 AM 8.90 34.08 0.00 0.00
11:20:00 AM 2.65 26.63 0.00 0.00
11:30:00 AM 1.91 34.92 0.00 0.00
11:40:01 AM 0.26 36.78 0.00 0.00
11:50:00 AM 0.53 32.94 0.00 0.00
12:00:00 PM 0.17 30.70 0.00 0.00
12:10:00 PM 1.22 27.89 0.00 0.00
12:20:00 PM 4.11 133.48 0.00 0.00
12:30:00 PM 0.41 31.31 0.00 0.00
Average: 130.91 27.04 0.00 0.00

Raw paging numbers may not be of concern, but a high number of major faults (majflt/s) indicate that the system needs more memory. Note that majflt/s is only valid with kernel versions 2.5 and later.

For network statistics, use sar -n DEV. The -n DEV option tells sar to generate a report that shows the number of packets and bytes sent and received for each interface. Here is an abbreviated version of the report:

# sar -r -n DEV -f /var/log/sa/sa19
Linux 2.6.9-42.ELsmp (util102) 12/19/2010

12:00:01 AM IFACE rxpck/s txpck/s rxbyt/s txbyt/s rxcmp/s txcmp/s rxmcst/s
12:10:01 AM lo 59.83 59.83 5067.06 5067.06 0.00 0.00 0.00
12:10:01 AM eth0 1210.52 1062.13 712124.65 498563.98 0.00 0.00 0.00
12:10:01 AM eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM sit0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:20:01 AM lo 52.41 52.41 4546.30 4546.30 0.00 0.00 0.00
12:20:01 AM eth0 1084.95 876.36 677283.00 280620.08 0.00 0.00 0.00
12:20:01 AM eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:20:01 AM sit0 0.00 0.00 0.00 0.00 0.00 0.00 0.00


To see network errors, try sar -n EDEV, which shows network failures, you can see following output there is no network errors.

# sar -r -n EDEV -f /var/log/sa/sa19

Linux 2.6.9-42.ELsmp (util102) 12/19/2010

12:00:01 AM IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s
12:10:01 AM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM sit0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:20:01 AM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:20:01 AM eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Reports on current activity

Sar can also be used to view what is happening with a specific subsystem, such as networking or I/O, almost in real time. By passing a time interval (in seconds) and a count for the number of reports to produce, you can take an immediate snapshot of a system to find a potential bottleneck.

For example, to see the basic report every second for the next 10 seconds, use sar 1 10. You can run any of the reports this way to see near real-time results.

Benchmarking

Even if you have plenty of horsepower to run your applications, you can use sar to track changes in the workload over time. To do this, save the summary reports (sar only saves seven) to a different directory over a period of a few weeks or a month. This set of reports can serve as a baseline for the normal system workload. Then compare new reports against the baseline to see how the workload is changing over time. You can automate your comparison reports with AWK or your favorite programming language.

In large systems management, benchmarking is important to predict when and how hardware should be upgraded. It also provides ammunition to justify your hardware upgrade requests.

Digging deeper

Most hardware performance problems are related to the disks, memory, or CPU. Perhaps more frequently, application programming errors or poorly designed databases cause serious performance issues.

Whatever the problems, sar and friends can give you a comprehensive view of how things are working and help track down bottlenecks to fix a sluggish system. The examples here just scratch the surface of what sar can do. If you take a look at the man pages, it should be easy to customize a set of reports for your needs.

No comments:

Post a Comment