Nagios check for Centreon to create CPU graphs for Linux

Centreon is a great front-end for Nagios, the well known monitoring tool.

Nagios only performs "up" and "down" checks, Centreon adds performance graph capabilities.

Centreon comes with many checks to measure values, like traffic on eth0, a ping response time check, an ntp check and so on. Many checks are based on SNMP, so for Linux machines net-snmp should be installed.

What Centreon is missing, is an SNMP check that reports CPU usage and graphs the information. Here is a shell script to get the values for a specified host and return the CPU-idle percentage, CPU-system percentage and CPU-user percentage.

The script depends on a binary snmpget found in the package net-snmp-utils. Install it on the Nagios pollers that perform this check.

This script implements Performance Data as described by Nagios, which is very short means that besides outputting readable data, it also outputs performance data after the pipe ("|") mark, separated by a comma.
The script has been designed to incorporate all described requirements by Nagios plugins.

#!/bin/sh

# Nagios plugin to report CPU usage on Linux boxes.

usage() {
# This function is called when a user enters impossible values.
echo "Usage: $0 -H HOSTADDRESS [-C COMMUNITY] [-w WARNING] [-c CRITICAL] [-v VERSION]"
echo
echo " -H HOSTADDRESS"
echo "     The host to check, either IP address or a resolvable hostname."
echo " -w WARNING"
echo "     The percentage of cpu-idle to start warning, defaults to 15."
echo " -c CRITICAL"
echo "     The percentage op cpu-idle to reflect a critical state, defaults to 5."
echo " -C COMMUNITY"
echo "     The SNMP community to use, defaults to public."
echo " -v VERSION"
echo "     The SNMTP version to use, defaults to 2c."
exit 3
}

readargs() {
# This function reads what options and arguments were given on the
# command line.
while [ "$#" -gt 0 ] ; do
  case "$1" in
   -H)
    if [ "$2" ] ; then
     host="$2"
     shift ; shift
    else
     echo "Missing a value for $1."
     echo
     shift
     usage
    fi
   ;;
   -w)
    if [ "$2" ] ; then
     warning="$2"
     shift ; shift
    else
     echo "Missing a value for $1."
     echo
     shift
     usage
    fi
   ;;
   -c)
    if [ "$2" ] ; then
     critical="$2"
     shift ; shift
    else
     echo "Missing a value for $1."
     echo
     shift
     usage
    fi
   ;;
   -C)
    if [ "$2" ] ; then
     community="$2"
     shift ; shift
    else
     echo "Missing a value for $1."
     echo
     shift
     usage
    fi
   ;;
   -v)
    if [ "$2" ] ; then
     version="$2"
     shift ; shift
    else
     echo "Missing a value for $1."
     echo
     shift
     usage
    fi
   ;;
   *)
    echo "Unknown option $1."
    echo
    shift
    usage
   ;;
  esac
done
}

setvariables() {
# Here is a function to set some default values.
cpurawidle="UCD-SNMP-MIB::ssCpuRawIdle.0"
cpurawuser="UCD-SNMP-MIB::ssCpuRawUser.0"
cpurawsystem="UCD-SNMP-MIB::ssCpuRawSystem.0"
if [ ! "$warning" ] ; then warning="15" ; fi
if [ ! "$critical" ] ; then critical="5" ; fi
tmpdir="/tmp/nagios"
}

checkvariables() {
# This function checks if all collected input is correct.
if [ ! "$host" ] ; then
  echo "Please specify a hostname or IP address."
  echo
  usage
fi
if [ "$warning" -lt "$critical" ] ; then
  echo "Critical may not be higher than warning. Please modify your critical an warning values."
  echo
  usage
fi
if [ ! "$community" ] ; then
  # The public community is used when a user did not enter a community.
  community="public"
fi
if [ ! "$version" ] ; then
  # Version 2c is used when a user did not enter a version.
  version="2c"
fi
if [ ! -d "$tmpdir" ] ; then
  mkdir "$tmpdir"
  if [ $? -gt 0 ] ; then
   echo "Unknown cannot create $tmpdir!"
   exit 3
  fi
fi
}

getandprintresults() {
# First, get all values in one snmpget session. I think this is lighter for
# the machine that is queried compared to three separated snmpgets.
snmpget -c "$community" -v "$version" -t 3 "$host" "$cpurawidle" "$cpurawuser" "$cpurawsystem" | while read mib equals type digit ; do
case "$mib" in
  # This output is returned for the cpuidle value.
  UCD-SNMP-MIB::ssCpuRawIdle.0)
   cpuidlevalue="$digit"
  ;;
  # This output is returned for the cpuuser value.
  UCD-SNMP-MIB::ssCpuRawUser.0)
   cpuuservalue="$digit"
  ;;
  # This output is returned for the cpusystem value.
  UCD-SNMP-MIB::ssCpuRawSystem.0)
   cpusystemvalue="$digit"

   if [ -f "$tmpdir"/"$host".cpuidle ] ; then
    cpuidlediff=$(($cpuidlevalue - $(cat "$tmpdir"/"$host".cpuidle)))
   fi
   echo "$cpuidlevalue" > "$tmpdir"/"$host".cpuidle

   if [ -f "$tmpdir"/"$host".cpuuser ] ; then
    cpuuserdiff=$(($cpuuservalue - $(cat "$tmpdir"/"$host".cpuuser)))
   fi
   echo "$cpuuservalue" > "$tmpdir"/"$host".cpuuser

   if [ ! -f "$tmpdir"/"$host".cpusystem ] ; then
    echo "$cpusystemvalue" > "$tmpdir"/"$host".cpusystem
    echo "First run, gathering data."
    exit 3
   else
    cpusystemdiff=$(($cpusystemvalue - $(cat "$tmpdir"/"$host".cpusystem)))
    echo "$cpusystemvalue" > "$tmpdir"/"$host".cpusystem
   fi

   # Add all differences, so a calculation of the percentage can be made later.
   allcpu=$(($cpuidlediff + $cpuuserdiff + $cpusystemdiff))

   # Now calculate how many percent each value represents.
   cpuidlevalue=$((($cpuidlediff*100)/$allcpu))
   cpuuservalue=$((($cpuuserdiff*100)/$allcpu))
   cpusystemvalue=$((($cpusystemdiff*100)/$allcpu))

   # Now see if any of these percentages is over a threshold.
   if [ "$cpuidlevalue" -lt "$critical" ] ; then
    # First see if it's in a critical state.
    echo "CPU CRITICAL idle value: $cpuidlevalue%|cpuidle=$cpuidlevalue% cpuuservalue=$cpuuservalue% cpusystemvalue=$cpusystemvalue%"
    exit 2
   elif [ "$cpuidlevalue" -lt "$warning" ] ; then
    # Now see if warning applies.
    echo "CPU WARNING idle value: $cpuidlevalue%|cpuidle=$cpuidlevalue% cpuuservalue=$cpuuservalue% cpusystemvalue=$cpusystemvalue%"
    exit 1
   else
    # If neither critical, nor warning apply, it must be OK!
    echo "CPU OK idle value: $cpuidlevalue%|cpuidle=$cpuidlevalue% cpuuservalue=$cpuuservalue% cpusystemvalue=$cpusystemvalue%"
    exit 0
   fi
  ;;
  esac
done
}

# The calls to the different functions.
readargs "$@"
setvariables
checkvariables
getandprintresults

Don't forget to chmod (755) the script on the Poller(s).

Now go into the Centreon web front end and add a command at:
Configuration - Commands - Add.
I named the check "check_cpu" where the command line is:

$USER1$/check_snmp_cpu -H $HOSTADDRESS$ -C $ARG1$

Bind this check to a service template and bind a hostgroup to the service template. Remember Centreon does not use $USER2$, but $_HOSTSNMPCOMMUNITY$.

Comments

It looks hard at the start,

It looks hard at the start, but this process really works. - Mark Zokle

Can't get it to work from

Can't get it to work from interface, only from command line.

From command line everything looks fine:

[email protected] plugins]# /usr/lib/nagios/plugins/check_cpu -H skylab -C [suppress]
CPU OK idle value: 99%|cpuidle=99% cpuuservalue=0% cpusystemvalue=0%

But from Centreon interface only got: "First run, gathering data."

Any idea?

Update: Works only when

Update: Works only when plugin checks for a second machine.

I guess you ran it first time

I guess you ran it first time as "root", and now nagios can't read or append to those files.

Try to remove /tmp/nagios/$host.* so that nagios can create the files.

Regards,

Robert de Bock.

I can't get the command to

I can't get the command to work, every time i run the command it gives the following error:

UCD-SNMP-MIB::ssCpuRawIdle.0 = Counter32: 14766000
./check_cpu_don: line 138: UCD-SNMP-MIB::ssCpuRawUser.0: command not found

Please any help will be appreciated as i need this to monitor CPU vales on remote servers. Thanks

Your copy paste likely did

Your copy paste likely did not work properly, some lines have been broken.

Open that file with vi, go to line 138 and check it.

To go to line 138 directly in vi, use:
vi check_cpu_don +138

Good luck!

Thanks for your plugin, works

Thanks for your plugin, works great.

Maybe it's better to change the tmpdir, some boxes clean this directory after reboot. I've changed this to /var/nagiostmp/cpu/:

tmpdir="/var/nagiostmp/cpu"
if [ ! -d "$tmpdir" ] ; then
  mkdir -p "$tmpdir"
  if [ $? -gt 0 ] ; then
   echo "Unknown cannot create $tmpdir!"
   exit 3
  fi
fi

hi, when i execute the plugin

hi, when i execute the plugin sometimes i have the error: line 171: (0*100)/0: division by 0 (error token is "0")
Any ideas?

About Consultancy Articles Contact




References Red Hat Certified Architect By Robert de Bock Robert de Bock
Curriculum Vitae By Fred Clausen +31 6 14 39 58 72
By Nelson Manning [email protected]