Centreon is a great front-end for Nagios, the well known monitoring tool.
Nagios only performs "up" and "down" checks, Centreon adds performance graph capabilities.
Centreon comes with many checks to measure values, like traffic on eth0, a ping response time check, an ntp check and so on. Many checks are based on SNMP, so for Linux machines net-snmp should be installed.
What Centreon is missing, is an SNMP check that reports CPU usage and graphs the information. Here is a shell script to get the values for a specified host and return the CPU-idle percentage, CPU-system percentage and CPU-user percentage.
The script depends on a binary snmpget found in the package net-snmp-utils. Install it on the Nagios pollers that perform this check.
This script implements Performance Data as described by Nagios, which is very short means that besides outputting readable data, it also outputs performance data after the pipe ("|") mark, separated by a comma.
The script has been designed to incorporate all described requirements by Nagios plugins.
#!/bin/sh
# Nagios plugin to report CPU usage on Linux boxes.
usage() {
# This function is called when a user enters impossible values.
echo "Usage: $0 -H HOSTADDRESS [-C COMMUNITY] [-w WARNING] [-c CRITICAL] [-v VERSION]"
echo
echo " -H HOSTADDRESS"
echo " The host to check, either IP address or a resolvable hostname."
echo " -w WARNING"
echo " The percentage of cpu-idle to start warning, defaults to 15."
echo " -c CRITICAL"
echo " The percentage op cpu-idle to reflect a critical state, defaults to 5."
echo " -C COMMUNITY"
echo " The SNMP community to use, defaults to public."
echo " -v VERSION"
echo " The SNMTP version to use, defaults to 2c."
exit 3
}
readargs() {
# This function reads what options and arguments were given on the
# command line.
while [ "$#" -gt 0 ] ; do
case "$1" in
-H)
if [ "$2" ] ; then
host="$2"
shift ; shift
else
echo "Missing a value for $1."
echo
shift
usage
fi
;;
-w)
if [ "$2" ] ; then
warning="$2"
shift ; shift
else
echo "Missing a value for $1."
echo
shift
usage
fi
;;
-c)
if [ "$2" ] ; then
critical="$2"
shift ; shift
else
echo "Missing a value for $1."
echo
shift
usage
fi
;;
-C)
if [ "$2" ] ; then
community="$2"
shift ; shift
else
echo "Missing a value for $1."
echo
shift
usage
fi
;;
-v)
if [ "$2" ] ; then
version="$2"
shift ; shift
else
echo "Missing a value for $1."
echo
shift
usage
fi
;;
*)
echo "Unknown option $1."
echo
shift
usage
;;
esac
done
}
setvariables() {
# Here is a function to set some default values.
cpurawidle="UCD-SNMP-MIB::ssCpuRawIdle.0"
cpurawuser="UCD-SNMP-MIB::ssCpuRawUser.0"
cpurawsystem="UCD-SNMP-MIB::ssCpuRawSystem.0"
if [ ! "$warning" ] ; then warning="15" ; fi
if [ ! "$critical" ] ; then critical="5" ; fi
tmpdir="/tmp/nagios"
}
checkvariables() {
# This function checks if all collected input is correct.
if [ ! "$host" ] ; then
echo "Please specify a hostname or IP address."
echo
usage
fi
if [ "$warning" -lt "$critical" ] ; then
echo "Critical may not be higher than warning. Please modify your critical an warning values."
echo
usage
fi
if [ ! "$community" ] ; then
# The public community is used when a user did not enter a community.
community="public"
fi
if [ ! "$version" ] ; then
# Version 2c is used when a user did not enter a version.
version="2c"
fi
if [ ! -d "$tmpdir" ] ; then
mkdir "$tmpdir"
if [ $? -gt 0 ] ; then
echo "Unknown cannot create $tmpdir!"
exit 3
fi
fi
}
getandprintresults() {
# First, get all values in one snmpget session. I think this is lighter for
# the machine that is queried compared to three separated snmpgets.
snmpget -c "$community" -v "$version" -t 3 "$host" "$cpurawidle" "$cpurawuser" "$cpurawsystem" | while read mib equals type digit ; do
case "$mib" in
# This output is returned for the cpuidle value.
UCD-SNMP-MIB::ssCpuRawIdle.0)
cpuidlevalue="$digit"
;;
# This output is returned for the cpuuser value.
UCD-SNMP-MIB::ssCpuRawUser.0)
cpuuservalue="$digit"
;;
# This output is returned for the cpusystem value.
UCD-SNMP-MIB::ssCpuRawSystem.0)
cpusystemvalue="$digit"
if [ -f "$tmpdir"/"$host".cpuidle ] ; then
cpuidlediff=$(($cpuidlevalue - $(cat "$tmpdir"/"$host".cpuidle)))
fi
echo "$cpuidlevalue" > "$tmpdir"/"$host".cpuidle
if [ -f "$tmpdir"/"$host".cpuuser ] ; then
cpuuserdiff=$(($cpuuservalue - $(cat "$tmpdir"/"$host".cpuuser)))
fi
echo "$cpuuservalue" > "$tmpdir"/"$host".cpuuser
if [ ! -f "$tmpdir"/"$host".cpusystem ] ; then
echo "$cpusystemvalue" > "$tmpdir"/"$host".cpusystem
echo "First run, gathering data."
exit 3
else
cpusystemdiff=$(($cpusystemvalue - $(cat "$tmpdir"/"$host".cpusystem)))
echo "$cpusystemvalue" > "$tmpdir"/"$host".cpusystem
fi
# Add all differences, so a calculation of the percentage can be made later.
allcpu=$(($cpuidlediff + $cpuuserdiff + $cpusystemdiff))
# Now calculate how many percent each value represents.
cpuidlevalue=$((($cpuidlediff*100)/$allcpu))
cpuuservalue=$((($cpuuserdiff*100)/$allcpu))
cpusystemvalue=$((($cpusystemdiff*100)/$allcpu))
# Now see if any of these percentages is over a threshold.
if [ "$cpuidlevalue" -lt "$critical" ] ; then
# First see if it's in a critical state.
echo "CPU CRITICAL idle value: $cpuidlevalue%|cpuidle=$cpuidlevalue% cpuuservalue=$cpuuservalue% cpusystemvalue=$cpusystemvalue%"
exit 2
elif [ "$cpuidlevalue" -lt "$warning" ] ; then
# Now see if warning applies.
echo "CPU WARNING idle value: $cpuidlevalue%|cpuidle=$cpuidlevalue% cpuuservalue=$cpuuservalue% cpusystemvalue=$cpusystemvalue%"
exit 1
else
# If neither critical, nor warning apply, it must be OK!
echo "CPU OK idle value: $cpuidlevalue%|cpuidle=$cpuidlevalue% cpuuservalue=$cpuuservalue% cpusystemvalue=$cpusystemvalue%"
exit 0
fi
;;
esac
done
}
# The calls to the different functions.
readargs "$@"
setvariables
checkvariables
getandprintresultsDon't forget to chmod (755) the script on the Poller(s).
Now go into the Centreon web front end and add a command at:
Configuration - Commands - Add.
I named the check "check_cpu" where the command line is:
$USER1$/check_snmp_cpu -H $HOSTADDRESS$ -C $ARG1$Bind this check to a service template and bind a hostgroup to the service template. Remember Centreon does not use $USER2$, but $_HOSTSNMPCOMMUNITY$.
| About | Consultancy | Articles | Contact |
|
|
|
|
|
| References | Red Hat Certified Architect | By Robert de Bock | Robert de Bock |
| Curriculum Vitae | By Fred Clausen | +31 6 14 39 58 72 | |
| By Nelson Manning | robert@meinit.nl |
Comments
Can't get it to work from
Can't get it to work from interface, only from command line.
From command line everything looks fine:
root@bush plugins]# /usr/lib/nagios/plugins/check_cpu -H skylab -C [suppress]
CPU OK idle value: 99%|cpuidle=99% cpuuservalue=0% cpusystemvalue=0%
But from Centreon interface only got: "First run, gathering data."
Any idea?
Update: Works only when
Update: Works only when plugin checks for a second machine.
I guess you ran it first time
I guess you ran it first time as "root", and now nagios can't read or append to those files.
Try to remove /tmp/nagios/$host.* so that nagios can create the files.
Regards,
Robert de Bock.
I can't get the command to
I can't get the command to work, every time i run the command it gives the following error:
UCD-SNMP-MIB::ssCpuRawIdle.0 = Counter32: 14766000
./check_cpu_don: line 138: UCD-SNMP-MIB::ssCpuRawUser.0: command not found
Please any help will be appreciated as i need this to monitor CPU vales on remote servers. Thanks
Your copy paste likely did
Your copy paste likely did not work properly, some lines have been broken.
Open that file with vi, go to line 138 and check it.
To go to line 138 directly in vi, use:
vi check_cpu_don +138
Good luck!
Thanks for your plugin, works
Thanks for your plugin, works great.
Maybe it's better to change the tmpdir, some boxes clean this directory after reboot. I've changed this to /var/nagiostmp/cpu/:
tmpdir="/var/nagiostmp/cpu"if [ ! -d "$tmpdir" ] ; then
mkdir -p "$tmpdir"
if [ $? -gt 0 ] ; then
echo "Unknown cannot create $tmpdir!"
exit 3
fi
fi