Document ID: 0010
Topic: Monitoring, System Performance
Created: 2007-05-20
Last Updated: 2009-03-25
Author: Stefan Parvu
References: K9toolkit
OS: Solaris 10+

System Data Recorder

Monitoring the IT infrastructure is an important key to ensure your business continuity and prepare for future grow. SDR is a simple toolkit, containing a number of data collectors, used to record and report data from your Solaris servers. SDR is mainly designed around Solaris operating system due kernel statistics interface but it can be easily expanded to other OSes.

Scope

Solaris operating environment has already many utilities to debug and observe the entire system or certain individual processes. Third parties software applications can be installed to monitor the system or the applications: BMC Patrol/Predict, TeamQuest, Tivoli, Sitescope, Nagios, etc. Some of these software packages are focused in events management some focus on performance analysis and capacity planning. SDR tries to sit around performance ananlysis and focus on capacity planning even if there is a lot of work still to be done. Hand in hand with PDQ, a simple and powerful analytic model, SDR can be used to measure your infrastructure capacity.

Mainly we are interested in observing and recording:

All these numbers will help us in developing a simple capacity planning setup for our site.

SDR can help in cases where the budget is limited and the time to deploy the solution is an important factor for your site. You dont want to spend a lot of time setting up an expensive RDBMS system , in money and time, for your reports but rather a simple and reliable solution reducing the maintenance to almost 0. SDR is using RRDTool as the kernel for storing and reporitng the data.

Why SDR ?


Table of Contents

Data Recorder

Reporting

Troubleshooting

Demo Site

Bug Tracking


Design

The System Data Recorder is simple organized as two main things: the collection part, or the part which handles recording the data from each system and a reporting side where we permanently store and generate simple reports and graphs. For some configurations we can use only the recording part without the reporting side at all.

Data recorder consists of many simple utilities developed in Korn shell and Perl language which extract different telemetry from Solaris kernel statistic module. As well some recorders gather their data from various processes directly using OS or third parties utilities. There are a total of 5 recorders, which should be installed and deployed in any system and optional recorders needed only in certain cases: CMT, JVM.

If your system deploys some sort of virtualization then the recorders will operate from the global level. If the virtualization type includes domains or Xen technology then the recorders are deployed in all these systems.

SDR

Recorded data:

There are five main recorders: sysrec, netrec, nicrec , zonerec and cpurec. Each of these recorders are simple Perl or Ksh utilities running as separate processes, being light and designed to dont be considered a hog for your system when the system is under high utilisation. Additional we have corerec and jvmrec, recorders which should be deployed on systems which are based on CMT architecture or run Java Virtual Machine.

Each recorder is operated by the SMF, the Solaris service management facility in order to ensure their activity, restarting them automatically in case one fails or exists unexpected. As well dependency checking is easily implemented with SMF, for instance the recorders should not start if the local filesystem is not mounted or the network interfaces are not present when the recorder tries to start.

SDR

Each recorder outputs its data to a file called the raw output file. Every midnight we rotate this file using logadm utility and we compress it. This way we make sure the stored data is small and easy to be transported to our reporting system. The stored data is small and compact in size, majority of collectors record directly raw data in RRD format, easy to be imported into Round-Robin Database system, the final place where the data will be stored for 1 year or desired period of time of your site.

Recorders

The recording part consists of the following collectors or recorders:

Item Description Based On
sysrec system utilisation and saturation Perl5, Kstat
cpurec per-cpu detailed statistics Perl5, Kstat
nicrec network interface statistics Perl5, Kstat
netrec network protocol statistics Perl5, netstat
zonerec zone statistics Ksh, prstat
corerec CMT T1, T2 processor statistics Ksh, Perl5, cpustat
jvmrec garbage collection statistics Ksh, jstat

sysrec

sysrec is a utility, part of K9Toolkit, author Brendan Gregg. The toolkit is a collection of free Perl scripts used to troubleshoot and observe Solaris systems. Check Appendix for more details. The recorder has been modified to output its data into RRD format.

sysrec records system utilisation and saturation and it is used as a starting point in observing the system's health.

The output from sysrec is displayed below:

timestamp CPU Util % Mem Util % Disk Util % Net Util % CPU Sat % Mem Sat % Disk Sat % Net Sat %
1225038537: 10.56: 70.79: 9.87: 0.13: 0.02: 0.02: 0.19: 0.00
1225038539: 3.92: 70.79: 0.00: 0.00: 0.00: 0.00: 0.00: 0.00
1225038538: 2.94: 70.79: 0.00: 0.00: 0.00: 0.00: 0.00: 0.00

cpurec

cpurec is a utility, collecting per-CPU data from kstat The recorder outputs its data under RRD format.

cpurec used mainly to observe CPU activity and analyse how the CPUs are used in the system. Useful for capacity planning. Recording points:

The output from cpurec is displayed below:

timestamp Cpuid Xcalls Intr iThr Csw Icsw Migr Smtx Syscalls User % Sys % Idle %
1225039500: 1: 97: 568: 49: 1195: 69: 190: 40: 4183: 5.95: 4.37: 89.68
1225039500: 0: 98: 936: 513: 1169: 47: 190: 41: 4097: 6.17: 4.62: 89.21
1225039504: 1: 0: 90: 3: 219: 1: 28: 2: 289: 2.97: 0.00: 97.03
1225039504: 0: 0: 482: 378: 158: 3: 23: 3: 579: 0.00: 13.86: 86.14

nicrec

nicrec is a utility part of K9toolkit, author Brendan Gregg, printing network traffic, Kb/s read and written. The recorder outputs its data under RRD format.

nicrec used to observe the Kb/s transferred for all the network cards, including packet counts and average sizes. Recording points:

The output from nicrec is displayed below:

timestamp interface read KB/s write KB/s rPackets/s wPackets/s read average write average Util % Sat %
1225354256: e1000g0: 72.29: 3.07: 72.09: 41.56: 1026.89: 75.58: 0.06: 0.00:
1225354256: mac: 72.29: 3.07: 72.09: 41.56: 1026.89: 75.58: 0.06: 0.00:

netrec

netrec is a utility, reporting TCP, UDP and IP statistics from a running local or global zone. If the system deploys one or more zones and if all zones share same TCP/IP stack then you can simple use -s flag to report the numbers just once. The recorder outputs its data under RRD format.

netrec used to observe TCP, UDP and IP counters. Recording points:

The output from netrec is displayed below:

Note: If you observe for shared TCP/IP stacks we use -s flag in order to skip reporting same data for all zones

zonerec

zonerec is a simple script calling prstat utility to report zone utilisation in human readable format. This data needs to be parsed and prepared in RRD format. Future versions will include a new recorder which will output its data to RRD format.

zonerec used to observe CPU and Mem utilisation, as reported by prstat

The output from zonerec is displayed below:

corerec

corerec is a utility using corestat, from Cooltools. The output is human readable format, it will require parsing and proper formating for RRD.

corerec used to observe core utilisation from a T1 or T2 processor. Since T1 and T2 have different registers for keeping track of the usage the corestat utility has to be different for each case: corestat.t1 used for T1 processors and corestat.t2 for T2. Recording points:

The output from corerec is displayed below:

Important to note here is that utilisation for a T1 or T2 processor does not simple mean data from vmstat or mpstat alone. You have to use corerec in order to gather the correct utilisation figures. See below Ravindra Talashikar's notes about mpstat vmstat on T1 processors!

jvmrec

jvmrec is a utility based on jstat, part of JDK helping to extract Garbage Collection statistics from a running virtual machine. This recorder will loop over all running zones found on the system and will fetch each java process found and extract its GC numbers. The recorder outputs its data under RRD format.

jvmrec records the GC statistics useful to understand how your JVMs are running. Recording points:

The output from jvmrec is displayed below:

zone.pid timestamp S0% S1% Eden% Old% Perm% No.mGC Time.mGC No.MGC Time.MGC Total GC
global.23699: 1225360607: 0.00: 29.51: 10.03: 10.07: 60.44: 9: 0.100: 1: 0.058: 0.158
global.23699: 1225360668: 0.00: 29.51: 12.76: 10.07: 60.44: 9: 0.100: 1: 0.058: 0.158

Installation

SDR is a simple collection of scripts easy to install and setup under Solaris 10 systems. The recorders operate under Solaris service management facility , a nice interface for running services under Solaris 10. Each recorder is monitored by SMF and restarted in case is needed. For systems lower than Solaris 10, you need to enable yourself rc scripts.

Current Version

Version Description Release Notes
current: 0.63
Recording
  • cpurec: improved reporting for user/sys/idle
  • netrec: for system with no zones deployed new features available
  • raw2day: connectors for SSH, FTP for automating sending the logs to reporting site
  • sender: experimental job to send data every N minutes to reporting side
  • hdwRec: hardware recorder improvements

  • Reporting
  • Solaris x86, SPARC packages: soon to publish ready packages for reporting side

  • ChangeLog 0.63
    future: 0.70
    Recording
  • sysrec, takes into calcul ZFS ARC usage
  • ChangeLog 0.70

    SDR package

    SAR

    SDR uses SAR, system activity reporter. SAR is started using SMF so these are the main steps in order to get started SAR:

    Notice: SDR can run without SAR, if needed. System activity reporter collects a vast information from the server which can be send to the reporting side: paging activity, kernel memory allocation activity, message and semaphore activities, etc.

    SMF

    Log Rotation

    Maintenance



    SDR - Frequently Asked Questions

    General

    Recording

    References

    Back to main homepage

    This document is Copyright (c) 2009 Stefan Parvu
    Document License: PDL