Menu
- PNP4Nagios 0.6.x
- PNP4Nagios 0.4.x
PNP mandatory requires valid performance data of nagios plugins.
So what is this performance data?
The output of a nagios plugin up to nagios 2.x is limited to one line. When the plugin produces performance data, it is divided into two parts. The pipe symbol (“|”) is used as a delimiter.
Example check_icmp :
OK - 127.0.0.1: rta 2.687ms, lost 0% | rta=2.687ms;3000.000;5000.000;0; pl=0%;80;100;;
resulting in the text on the left side of the pipe symbol
OK - 127.0.0.1: rta 2.687ms, lost 0%
and the performance data
rta=2.687ms;3000.000;5000.000;0; pl=0%;80;100;;
Performance data is designed for automatic processing. The format is specified within the Developer Guidelines but should be exemplified here nonetheless:
rta=2.687ms;3000.000;5000.000;0; | | | | | | | |----|--|----|---------|-----|-|----- * label |--|----|---------|-----|-|----- * current value |----|---------|-----|-|----- unit ( UOM = UNIT of Measurement ) |---------|-----|-|----- warning threshold |-----|-|----- critical threshold |-|----- minimum value |----- maximum value
Value marked with * are mandatory. All other values are optional.
Several data series are separated by blanks. The actual data must not contains any blanks. If the label contains blanks, it has to be surrounded by single quotes.
PNP is licensed under GPL 2
Development of PNP is organized using Sourceforge.Net. PNP is registered under “PNP4nagios”.
The current stable version can be found in the download area: Sourceforge Download
If you want to be up to date you can use the latest developer version which is generated automatically on a daily basis from the SVN repository.
PRIOR to support questions please make sure that you have verified certain things described under verify your installation.
German support is available on http://nagios-portal.de. The developers will be informed about new postings in the PNP-section. Postings in english will be answered as well. Please use the “search” function first.
After registering as a user please fill out the profile regarding operating system and PNP version used. Please mention if you used a package or compiled the sources.
Please mark successfully solved threads by adding ”[solved]” to the title as it helps other users to find a solution for their problem.
The mailing lists on Sourceforge can be used to request support (and are limited to english):
pnp4nagios-users: users list for general questions regarding configuration. Please state your operating system and PNP version
pnp4nagios-devel: devel list for suggestions and error reports. Please state your operating system and PNP version
pnp4nagios-checkins: the checkin list automatically contains changes to the SVN repository
Performance data will be stored in Round Robin Databases using RRDtool. That means that after some time the oldest data will be dropped at the “end” and it will be replaced by new values “at the beginning”.
Various intervals provide for different resolutions. Using the defaults allows to store the data with a resolution of one minute for the last two days, five minutes resolution for ten days, 30 minutes resolution for 90 days and 6 hours resolution for four years. The increasing interval causes averaging of the data which leads to smaller max values. This not an error of PNP.
Using this storage format the size of the files will stay the same over time. Per datasource you will need approx. 400 KB.
PNP supports several modes to process performance data. The modes differ in complexity and the performance to be expected.
The following image shows the connections between Nagios, PNP and RRDtool
Nagios invokes a command for every host and every service whose performance data should be processed. Depending on the mode you choose the data will be passed to process_perfdata.pl or will be written to temporary files and processed at a later time. process_perfdata.pl writes the data to XML files and stores them in RRD files using RRDtool.
Before you choose a mode please read the documentation and decide which way will be the best for installation.
The “default mode” is the simplest and easiest to set up. Nagios will call the perl script process_perfdata.pl
for every service to process the data. The default mode will work very good up to about 2.000 services in a 5 minute interval.
In bulk mode nagios writes the necessary data to a temporary file. After expiration of a defined time the file will be processed in one piece and deleted afterwards.
The number of calls of process_perfdata.pl will be reduced by a multiple. Depending on time and the amount of collected data there will be much less system calls. Instead, process_perfdata.pl will run longer.
Note Using this mode you should keep an eye on the runtime of process_perfdata.pl. While it is running to process data nagios will not execute any checks.
snippet of var/perfdata.log:
2007-10-18 12:05:01 [21138] 71 Lines processed 2007-10-18 12:05:01 [21138] .../spool/service-perfdata-1192701894-PID-21138 deleted 2007-10-18 12:05:01 [21138] PNP exiting (runtime 0.060969s) ...
71 lines were processed in 0.06 seconds. This will be the data volume of about 2000 services und processing using a 10 second interval. It means we blocked nagios for exactly 0.06 seconds.
This is the most complicated way but one with the most performance, too.
Nagios again uses a temporary file to store the data and executes a command after expiration of a certain time. Instead of immediate processing by process_perfdata.pl the file is moved to a spool directory. As moving a file inside the same filesystem nearly takes no time nagios is able to execute crucial work immediately.
The NPCD daemon (Nagios Performance C Daemon) will monitor the directory for new files and will pass the names to process_perfdata.pl. Processing of performance data is decoupled completely from nagios. NPCD itself is able to start multiple thread for processing the data.
Which mode you choose will depend on the size of your nagios installation. You will find theses terms throughout the documentation.
The installation of PNP will be described in more detail. It is expected that nagios was compiled from source and is located in /usr/local/nagios.
Attention: The description applies to PNP 0.4.14. Hints to the current version can be found here.
Please note that PNP has to be configured after the installation.
The installation of PNP is controlled by makefiles. The system is analyzed after invocation of ./configure and the detected values are tranferred to makefiles.
Please unpack PNP as user root:
cd /usr/local/src wget http://downloads.sourceforge.net/pnp4nagios/pnp-0.4.14.tar.gz tar -xvzf pnp-0.4.14.tar.gz cd pnp-0.4.13
./configure is to be called from the directory pnp-<version> (pnp-0.4.14 in our case).
./configure
Some lines run across the screen. The output at the end is important.
*** Configuration summary for pnp 0.4.14 *** General Options: ------------------------- ------------------- Nagios user/group: nagios nagios Install directory: /usr/local/nagios HTML Dir: /usr/local/nagios/share/pnp Config Dir: /usr/local/nagios/etc/pnp Path to rrdtool: /usr/bin/rrdtool (Version 1.2.15) RRDs Perl Modules: FOUND (Version 1.2015) RRD Files stored in: /usr/local/nagios/share/perfdata process_perfdata.pl Logfile: /usr/local/nagios/var/perfdata.log Perfdata files (NPCD) stored in: /usr/local/nagios/var/spool/perfdata/
The paths shown should be checked. If the displayed values aren't correct you can change them calling ./configure with appropriate options.
Attention: “Path to rrdtool” means path including name of binary! If necessary it can be specified using the following syntax:
./configure --with-rrdtool=/usr/local/rrdtool-1.2.xx/bin/rrdtool
./configure --help
shows the supported options.
Attention: If Nagios isn't installed under /usr/local/nagios and especially when using preconfigured Nagios packages the use of /configure ----prefix=<Nagios-Home>
will most likely NOT be sufficient to install PNP correctly. Please have a careful look at the options at the end of the configure help page!
Example for Icinga
USER=icinga GROUP=icinga PREFIX=/usr/local/icinga ./configure \ --with-nagios-user=$USER \ --with-nagios-group=$GROUP \ --prefix=$PREFIX \ --datarootdir=$PREFIX/share/pnp \ --with-rrdtool=/usr/bin/rrdtool \ --sysconfdir=$PREFIX/etc/pnp \ --with-perfdata-dir=$PREFIX/share/perfdata \ --with-perfdata-logfile=$PREFIX/var/perfdata.log \ --with-perfdata-spool-dir=$PREFIX/var/spool/perfdata
Invoking
make all
compiles the components like NPCD which are written in C
make install
copies everything to the right places in the file system. The paths were already shows during ./configure.
You can call
make install-config
optionally. This way config files for process_perfdata.pl and npcd are copied to etc/pnp.
To install the NPCD Init script call
make install-init
All these steps are combined in
make fullinstall
The update works the same way as an installation. Please note that you have to call ./configure
with the same options you used during the first installation.
Please check if you changed anything in the folder share/pnp/templates.dist
. Own templates should be placed in share/pnp/templates
to avoid being overwritten.
Attention: If you changed config.php then you should save this file before it is overwritten when you execute “make install-config”.
PNP isn't available as an official debian package yet. Sven Velt is working hard to change that.
You will find debian packages on http://www.velt.de/tags/nagios-pnp
After installation some components of PNP were copied to the appropriate places in the file system. These are
the PHP-Files for the web-frontend in
/usr/local/nagios/share/pnp
the data collector process_perfdata.pl in
/usr/local/nagios/libexec
sample config files with the suffix -sample
in
/usr/local/nagios/etc/pnp
the config file config.php for the web frontend in
/usr/local/nagios/etc/pnp
The configuration of the already mentioned modes of performance data processing will be described in more detail.
The default-mode is the simplest way to integrate the data collector process_perfdata.pl
into nagios. Every event will trigger an execution of process-service-perfdata
.
Initially you have to enable processing of performance data in nagios.cfg
. Please note that this directive might already exist in the config file. Default is “0”.
process_performance_data=1
Data processing has to be disabled in the definition of every host or service whose performance data should NOT be processed.
define service { ... process_perf_data 0 ... }
Since Nagios 3.x it is possible to deactivate the export of environment variables (as part of optimizing the system for maximum performance). Unfortunately this directive has to be enabled to use the default mode. So either you use the default value (which means that the export is enabled) or you define the variable in nagios.cfg
enable_environment_macros=1
Additionally the command to process performance data is to be specified in nagios.cfg
service_perfdata_command=process-service-perfdata
Starting with Nagios 3.0 it may be useful to enable processing of performance data for hosts as well. Due to changed host check logic Nagios 3 now performs regularly scheduled host checks.
host_perfdata_command=process-host-perfdata
Nagios has to be notified about the referenced commands as well. If you used the quickstart installation guides for Nagios you can modify the definitions in commands.cfg. You can see that calling process_perfdata.pl doesn't require any arguments apart from specifing the option -d ( DATATYPE ) if you want to process performance data resulting from host checks.
define command { command_name process-service-perfdata command_line /usr/bin/perl /usr/local/nagios/libexec/process_perfdata.pl } define command { command_name process-host-perfdata command_line /usr/bin/perl /usr/local/nagios/libexec/process_perfdata.pl -d HOSTPERFDATA }
Note process_perfdata.pl
cannot be started under control of ePN ( embedded Perl Nagios ). Therefore the script is explicitly called using /usr/bin/perl
( or where you perl binary is located ). If you use Nagios 3.x or do not use ePN there is no need to specify /usr/bin/perl
.
Bulk mode is a bit more complicated than the default-mode but reduces the load on the nagios server significantly because the data collector process_perfdata.pl
is not invoked for every service.
In bulk-mode nagios writes the data to a temporary file in a defined format. This file is processed by process_perfdata.pl
at certain intervals. Nagios will take care for starting and running it periodically.
Processing of performance data has to be enabled in nagios.cfg
process_performance_data=1
Additionally some new directives are required
# # service performance data # service_perfdata_file=/usr/local/nagios/var/service-perfdata service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$ service_perfdata_file_mode=a service_perfdata_file_processing_interval=15 service_perfdata_file_processing_command=process-service-perfdata-file # # host performance data starting with Nagios 3.0 # host_perfdata_file=/usr/local/nagios/var/host-perfdata host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$ host_perfdata_file_mode=a host_perfdata_file_processing_interval=15 host_perfdata_file_processing_command=process-host-perfdata-file
The directives and their meaning:
service_perfdata_file
path to the temporary file which should contain the performance data.service_perfdata_file_template
format of the temporary file. Data will be defined using nagios macros.service_perfdata_file_mode
option “a” specifies that data is to be appended to the file.service_perfdata_file_processing_interval
the interval is 15 secondsservice_perfdata_file_processing_command
the command to be called during the interval.The used commands have to be announced to Nagios. If you used the quickstart installation guides for Nagios you can modify the definitions in commands.cfg.
define command{ command_name process-service-perfdata-file command_line $USER1$/process_perfdata.pl --bulk=/usr/local/nagios/var/service-perfdata } define command{ command_name process-host-perfdata-file command_line $USER1$/process_perfdata.pl --bulk=/usr/local/nagios/var/host-perfdata }
NOTE:
process_perfdata.pl
will take longer to do this so you should check the TIMEOUT value in etc/pnp/process_perfdata.cfg
and adjust it appropriately.The configuration is identical to the Bulk-mode except for the used command. If you used the quickstart installation guides for Nagios you can modify the definitions in commands.cfg.
define command{ command_name process-service-perfdata-file command_line /bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/perfdata/service-perfdata.$TIMET$ } define command{ command_name process-host-perfdata-file command_line /bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/perfdata/host-perfdata.$TIMET$ }
Using these commands the file service-perfdata will be moved to var/spool/ after the interval specified in service_perfdata_file_processing_interval
has passed. The nagios macro $TIMET$ is appended to the filename to avoid overwriting of old files unintentionally. The macro $TIMET$ contains the current timestamp in time_t format (seconds since the UNIX epoch).
In the directory /usr/local/nagios/var/spool/perfdata files are gathered to be processed by NPCD.
NPCD monitors the spool directory and passes the file names to process_perfdata.pl
. This way processing of performance data is completely decoupled from nagios.
Before starting NPCD you have to check the paths to the spool directory and to process_perfdata.pl
specified in the config file npcd.cfg
.
The only thing that remains is to start NPCD.
/usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
The option -d
starts NPCD as a daemon in the background.
If everything went well until now you can try to call PNP using your web browser.
On the assumption nagios can be reached using the URL /nagios
PNP should be called using /nagios/pnp/index.php
.
Called without any arguments PNP looks for RRD and XML files in share/perfdata and shows all graphs of the first host.
ATTENTION: Immediately after (re-)starting Nagios after you enabled the processing of performance data you will get error messages in your browser because performance data has to be collected and stored in RRD files. Depending on the check interval you are using you have to wait some time before you can view the graphs.
Calling make install-config
during installation will create a sample config file (etc/pnp/process_perfdata.cfg-sample
). The values in the sample file will correspond to the defaults used by process_perfdata.pl
so normally you do not have a file called process_perfdata.cfg
while running the procedure.
However you can influence the way process_perfdata.pl
works by changing options which have to be specified in process_perfdata.cfg
.
The most important options launching PNP are LOG_LEVEL and LOG_FILE. We recommend setting the LOG_LEVEL
value to “2” so you can track what process_perfdata.pl will do.
Most likely we will ask for excerpts from perfdata.log if you open a support request on the mailing lists.
During normal operation the debug level should be set to 0 to avoid performance issues due to unnecessary entries in the log file.
Some basic settings should be checked
1. Have any RRD and XML files been created?
process_perfdata.pl
will create a new directory under share/perfdata for every host. In this directory an RRD database and an XML file will be created for every service.
The error message “No valid RRD Files found” indicates that the plugin doesn't deliver any/valid performance data! Sometimes you have to specify additional options so that performance data is produced. In some cases a wrapper script might help.
However not all checks provide performance data. That applies - among others - to “check_ping” in contrast to “check_icmp” which does provide data (starting with Nagios plugin version 1.4.12 check_ping does provide performance data).
Using the web interface the detail information of hosts/services show a field “Performance Data”. If it is empty there is no data available so no files are written to the appropriate directory and that is why PNP does not provide you with graphs!
The following image shows the information of a “PING” service. The output of the plugin is surrounded by a blue border, the performance data by a red one.
2. Has nagios called process_perfdata.pl
?
In the config file for process_perfdata.pl (etc/pnp/process_perfdata.cfg
) you can increase the debug level. Data processing will be logged in var/perfdata.log
.
3. Graphs are shown without text? Have a look at the requirements.
4. You can use the script verify_pnp_config in the contrib
directory of the installation folder to check your settings and if performance data is present and/or valid. The syntax is quite simple for that:
./verify_pnp_config -m <mode>
whereas <mode>
is one of “default”, “bulk” or “npcd” (without quotes).
Since PNP 0.4.12 the contrib directory contains a perl script (verify_pnp_config) which enables you to check the configuration settings as well as performance data of hosts or services. It can be used prior and during runtime of PNP.
* Note *: The information applies to verify_pnp_config v0.1.17 which is available in the current developer version (starting with SVN Rev. 644) downloadable via http://www.pnp4nagios.org/pnp/dwnld.
Older versions may have fewer options so in the descriptions of the various options you will find hints to the PNP versions.
* Note *: The “long” options always start with two ”-” which isn't clearly visible in the text.
PNP-Version 0.4.12 contains verify_pnp_config 0.1.2
PNP-Version 0.4.13 contains verify_pnp_config 0.1.9
PNP-Version 0.4.14 contains verify_pnp_config 0.1.12
* Attention * 0.1.9 contains a bug which leads to an error message while checking the template definition in bulk and NPCD mode. The last parameter will be flagged as erroneous even if it is correct. Please add the statement $val .= “\\t”;
prior to line 632 so that the lines look like this:
629 if ($val !~ /([A-Z]+?::[A-Z\$]+?.?)+/) { 630 info ("invTemplate"); 631 } 632 $val .= "\\t"; 633 while ($val =~ s/([A-Z]+?)::(.+?)([^A-Z\$].)//) {
* Attention * 0.1.12 contains bugs when checking performance data. If you need this feature then please use the current SVN version (starting with Rev. 633).
0.1.12 contains another bug which leads to an error message while checking the template definition in bulk and NPCD mode. SERVICEOUTPUT and HOSTOUTPUT resp. are reported missing but they don't belong to the template definition of PNP. They are in the default config file of Nagios instead (fixed in Rev. 639).
Checking the configuration can be done executing
./verify_pnp_config -m <mode>
replacing <mode> by default, bulk or NPCD.
Specifying the option -h
or –-help
respectively shows the following lines:
-h, --help print these lines -b, --basedir=s Nagios Base directory (default: /usr/local/nagios) -B, --binary=s Nagios binary (default: nagios) -c, --config=s Nagios main config file (default: /usr/local/nagios/etc/nagios.cfg) -m, --mode=s PNP mode ("default", "bulk", "NPCD") -l, --logfile=s check configure log file -N, --npcdcfg=s PNP config file for NPCD mode (default: /usr/local/nagios/etc/pnp/npcd.cfg) -P, --ppcfg=s process_perfdata config file (default: /usr/local/nagios/etc/pnp/process_perfdata.cfg) -p, --precheck use config files instead of objects cache -r, --rrdtool=s specify the location of the RRDtool binary -R, --RRDpath=s specify the perfdata directory (default: /usr/local/nagios/share/perfdata) or "no" for no check -U, --resource=s location of the resource config file (default: /usr/local/nagios/etc/resource.cfg) -M, --monitor=s specify the monitoring product (default: nagios) -L, --layout=s specify a layout (Nagios2, Nagios3, SuSE, Fedora) -T, --template=s specify the path to the templates directory (default /usr/local/nagios/share/pnp/templates.dist) -u, --user=s user of the perfdata directory -g, --group=s group of the perfdata directory -q, --quiet quiet mode, non-zero return code will indicate errors -o, --object=s Nagios object (host name, service description) -n, --native show messages in native language (so far "es" or "de") -e, --english show english messages/links -d, --debug some debugging output
The Nagios program and access to the main configuration file are always necessary. If you have non-standard paths because you installed a Nagios package you can try to use one of the predefined layouts. “suse” and “fedora” should work on the appropriate distributions while “nagios2” and “nagios3” should work on other depending on the Nagios version you installed. If none of these methods work you can use three options (-b, -B and -c) to specify the nagios base directory, the name of the binary and the place of the main config file. If the program name starts with a ”/” then this value is taken as an absolute path which isn't modified anymore. If it doesn't start with a ”/” then the path is composed of the basedir, the string “bin” and the binary. Using -U you can specify the location on the resource config file. (-b / -c starting with PNP 0.4.12; -B starting with Rev. 612; -U starting with Rev. 635)
Without specifying any options the help page will be shown so you'll have to specify either the mode or an object.
Using the option -m
(--mode
) you specify one of the PNP modes whose settings will be checked. The option -l <filename>
(--logfile=<filename>
) allows to check the config during installation of PNP. You have to execute ./configure
(with additional options if necessary) first which creates the file config.log
. This name has to be passed as parameter value. The script checks if software requirements are met and if several settings have been specified correctly. That includes a call of the RRDtool binary so you may use the option -r <location>
(--rrdtool <location>
) if the binary can not be found at /usr/bin/rrdtool
.
(-m / -l starting with PNP 0.4.12, -r starting with PNP 0.4.13)
The script checks if owner and group of the directories and files below the perfdata folder correspond to the values given in nagios.cfg. Additionally the xml files are checked for non-zero return codes of RRDtool. Using the option -R
(--RRDpath
) you can specify the directory where the RRD files are located if its place is non-standard. If you don't wish these checks to be performed please specify “no” as directory name. Using the options -u (--user) and -g (--group) you can specify user and/or group of the perfdata directory if they don't match the values of the nagios user.
(-R starting with PNP 0.4.14 / SVN Rev. 598; -u / -g starting with Rev. 635).
After the installation the changes in nagios.cfg
can be checked using the option -p
(--precheck
) before restarting nagios. This way you can correct any errors without restarting every time.
Together with the option -o <object>
(--object=<object>
) you specify a string which is compared to hostnames and/or service descriptions in the objects cache file. The string should be enclosed in quotes to escape blanks and several special characters.
If the string matches the name and any performance data will be shown. If no or invalid performance data is present appropriate messages will be given.
Appending a semicolon to the string will result in comparing only hostnames, prepending a semicolon only inspects service descriptions. A semicolon within the string separates hostname and service description.
(-o starting with PNP 0.4.12, semicolon starting with PNP 0.4.13)
When using the NPCD mode you can use -N <config file>
(--npcdcfg=<config file>
) to specify the location of the config file if its name or location differs from the default (NAGIOS_BASE/etc/pnp/npcd.cfg) (-N starting with PNP 0.4.13).
Using the option -P <config file>
(--ppcfg=<config file>
) you can specify the name of the config file for process_perfdata.pl if name or location differs from the default (NAGIOS_BASE/etc/pnp/process_perfdata.cfg) (-P starting with PNP 0.4.13).
Using the option -M (--monitor) enables you to specify the product which delivers the data to PNP. The default is “nagios” but “icinga” is supported now as well. Additionally you may have to use the options -b, -B and -c as well. (-M starting with Rev. 640)
Sometimes changes in template files result in error which are hard to find in the web GUI. Using the option -T the template files are checked for errors. Specify the path to the templates diretory as a parameter. (-T starting with Rev. 655)
Using -n (--native) you can specify “es” or “de” to see spanish or german messages, respectively. (-e starting with Rev. 664, PNP 0.4.15)
The option -e
(--english
) enables you to force the use of english messages if the script detects german language settings (-e starting with PNP 0.4.13).
The option -d
(--debug
) will output additional lines which may help during trouble shooting whereas -q (--quiet) suppresses all output. Errors will result in a non-zero return code.
(-d starting with PNP 0.4.12, -q starting with PNP 0.4.13)
Each output line will start with a letter indicating the type of information:
[I]
informational message about settings, things to be done, …
[A]
actions to be taken
[W]
warning message
[E]
error message: PNP will not work without resolving the problem(s)
[H]
hint: it might be worth reading the appropriate documentation
[D]
debugging message, hopefully showing the source of your problem
(one letter types starting with PNP 0.4.13).
check_procs is an example for a plugin which doesn't deliver performance data:
./check_procs -a ndo2db -w 1: -c 0: PROCS OK: 2 processes with args 'ndo2db'
This can be changed with the following wrapper script
check_procs.sh
#!/bin/bash LINE=`/usr/local/nagios/libexec/check_procs $*` RC=$? COUNT=`echo $LINE | awk '{print $3}'` echo $LINE \| procs=$COUNT exit $RC
Now you'll get the number together with the required label
./check_procs.sh -a ndo2db -w 1: -c 0: PROCS OK: 2 processes with args 'ndo2db'| procs=2
Of course PNP should be easily accessible. You do not want to search long for the right graph.
Nagios itself features external URLs using so called extended info configs. Due to changes between Nagios 2.x and Nagios 3.x both versions are described.
With Nagios 2.x the integration of external URLs into the nagios web interface is made using Extended Info Objects for services. For PNP we use the directive action_url to call the PNP web frontend with the appropriate options.
define serviceextinfo { host_name localhost service_description load action_url /nagios/pnp/index.php?host=$HOSTNAME$&srv=$SERVICEDESC$ }
You have to specify an additional Extended Info Definition for every service.
Since nagios 3.0 the action_url-directive has be moved to the host or service definition. The serviceextinfo and hostextinfo definitions are deprecated. This way the definition of URLs to the PNP-interface has been simplified.
First two nagios templates are defined. If you used the Nagios quickstart installation guides you can append these lines to templates.cfg:
define host { name host-pnp register 0 action_url /nagios/pnp/index.php?host=$HOSTNAME$ } define service { name srv-pnp register 0 action_url /nagios/pnp/index.php?host=$HOSTNAME$&srv=$SERVICEDESC$ }
These two templates can now be included via “use srv-pnp” or “use host-pnp” for services and hosts respectively. If you used the quickstart installation guide you might for example edit the file localhost.cfg and add the template to the host or service definition as follows:
define host{ use linux-server,host-pnp ; Name of host templates to use ; This host definition will inherit all variables that are defined ; in (or inherited by) the linux-server host template definition. host_name localhost alias localhost address 127.0.0.1 }
define service{ use local-service,srv-pnp ; Name of service template to use host_name localhost service_description PING check_command check_ping!100.0,20%!500.0,60% }
The links to the correct URLs are created automagically.
Starting with PNP 0.4.13 you can integrate PNP into Nagios in a way that you have current graphs without clicking any icons. This can be accomplished using the CGI Includes which allow us to include JavaScript code in the status detail view ( status.cgi ).
Prerequisites:
/usr/local/nagios/share/pnp/include/js
contains the files prototype.js
und overlib_mini.js
. Depending on the distribution the share folder may be located elsewhere. If in doubt have a look at the alias definition in the Nagios configuration file of your web server.ssi
from the contrib folder of the PNP package was copied to /usr/local/nagios/share. Please review the paths in the file status-header.ssi (both js-files and the ajax.php).Definition:
define host { name host-pnp register 0 action_url /nagios/pnp/index.php?host=$HOSTNAME$' onmouseover="get_g('$HOSTNAME$','_HOST_')" onmouseout='clear_g() } define service { name srv-pnp register 0 action_url /nagios/pnp/index.php?host=$HOSTNAME$&srv=$SERVICEDESC$' onmouseover="get_g('$HOSTNAME$','$SERVICEDESC$')" onmouseout='clear_g() }
After a restart of Nagios (after modifying the definitions) the result might look like this:
The behaviour of the PNP Web-Frontend can be controlled through the config file etc/pnp/config.php
. This file will be overwritten during updates of PNP as the paths and options are detected during ./configure
.
Own adjustments should be made in etc/php/config_local.php
. If this file does not exist the file config.php can be taken as a guideline.
Following the most important parameters:
$conf['rrdtool'] = "/usr/bin/rrdtool";
The path to the RRDtool binary. Will be detected by ./configure
.
$conf['graph_width'] = "500"; $conf['graph_height'] = "100";
Height and width of the RRD graphs.
$conf['graph_opt'] = "";
Additional options passed with every call of RRDTool, for example ----slope-mode
to smooth the graphs.
$conf['rrdbase'] = "/usr/local/nagios/share/perfdata/";
The path to the RRD and XML files created by process_perfdata.pl
.
$conf['page_dir'] = "/usr/local/nagios/etc/pnp/pages/";
The path to the config file for the pages.
PNP pages will be refreshed every n seconds.
$conf['refresh'] = "90";
Max. age of RRD files in seconds. After reaching this value links to the graphs will be marked as inactive.
$conf['max_age'] = 60*60*6;
Base URL to the Nagios CGIs.
$conf['nagios_base'] = "/nagios/cgi-bin";
List of users who are allowed to view links to the services of the current host.
$conf['allowed_for_service_links'] = "EVERYONE";
List of users who can view/access the host search field.
$conf['allowed_for_host_search'] = "EVERYONE";
If PNP is called with a host only ( index.php?host=<myserver> ), the defined user is shown an overview of all services related to this host.
$conf['allowed_for_host_overview'] = "EVERYONE";
The periods of time the RRD graphs will show are determined using the array $views[]. The title and number of graphs can be specified globally in this place.
$views[0]["title"] = "4 Hours"; $views[0]["start"] = ( 60*60*4 ); $views[1]["title"] = "24 Hours"; $views[1]["start"] = ( 60*60*24 ); $views[2]["title"] = "One Week"; $views[2]["start"] = ( 60*60*24*7 ); $views[3]["title"] = "One Month"; $views[3]["start"] = ( 60*60*24*30 ); $views[4]["title"] = "One Year"; $views[4]["start"] = ( 60*60*24*365 );
In the overview PNP shows five timeranges which can be defined in config.php.
Additionally you can influence the end of the timeranges via the URL. This can be useful to automatically create PDF documents. The ranges can be defined using the option “end”.
Example:
http://<Nagios host>/pnp/index.php?host=<hostname>&srv=<servicedesc>&end=-1week
The graph will end one week prior to the current date and time. The start will be adjusted depending on the selected view.
end | view | result |
---|---|---|
all views ending at current timestamp | ||
x | all views ending at defined date | |
x | one view ending at current timestamp | |
x | x | one view ending at defined date |
Examples of different specifications
format | description |
---|---|
2009W04 | 4. week of 2009 |
1.5.2009 | May, 1st 2009 |
-1 day | one day back |
-3 weeks | 3 weeks back |
-1 year | one year back |
yesterday | yesterday |
Using relative offsets (-<n> <timeperiod>) results in timestamps relative to the current time otherwise the ending time is at midnight.
“pages” provide the opportunity to collect graphs of different hosts/services on one page. That way - as an example - you can display the traffic rates of all tape libraries. Regular expressions are possible so you can accomplish a lot with only few definitions - provided that you have appropriate names. The directory specified using “$conf['page_dir']” contains one or more file with the extension ”.cfg”.
The file name (without the extension) appears in the list of available pages and will be used as title of the browser window. Comments start with a hash-sign (#) and are possible within lines as well. Each file contains a “page” definition which specifies the name of the page and it determines whether the following graph definition contains regular expressions or not.
Attention: “host_name” and “service_desc” refer to the name of the file in the perfdata directory, not to the definition in Nagios. Blanks are replaced by underscores (_).
define page { use_regex 1 # 0 = use no regular expressions, 1 = use regular expressions page_name test-page # page description }
One or more “graph” definitions follow:
define graph { host_name host1,host2,host3 service_desc Current_Load }
define graph { host_name host4 service_desc Current_Users }
And now some definitions with regular expressions. At first all hosts whose names are starting with “Tape”:
define graph { host_name ^Tape service_desc Traffic }
all hosts whose names are ending with “00”:
define graph { host_name 00$ service_desc Load }
all services of localhost whose names contain “a” or “o”, respectively:
define graph { host_name localhost service_desc a|o }
all services whose names contain an underscore followed by (at least) three digits on all hosts whose names start with “UX”:
define graph { host_name ^UX service_desc _\d{3} }
PNP uses templates to influence the appearance of RRD graphs.
The selected check_command determines which template will be used to control the graph. Following will be described where templates are stored and how the decision for the “right” template is made.
Templates are stored at two places in the file system.
If the graph for the service “http” on host “localhost” should be shown, PNP will look for the XML file perfdata/localhost/http.xml
and read its contents. The XML files are created automatically and contain information about the particular host and service. The header contains information about the plugin and the performance data. The XML tag <TEMPLATE>
identifies which PNP template will be used for this graph.
/localhost/http.xml
<NAGIOS> <DATASOURCE> <TEMPLATE>check_http</TEMPLATE> <DS>1</DS> <NAME>time</NAME> <UNIT>s</UNIT> <ACT>0.006721</ACT> <WARN>1.000000</WARN> <CRIT>2.000000</CRIT> <MIN>0.000000</MIN> <MAX></MAX> </DATASOURCE> <DATASOURCE> <TEMPLATE>check_http</TEMPLATE> <DS>2</DS> <NAME>size</NAME> <UNIT>B</UNIT> <ACT>263</ACT> <WARN></WARN> <CRIT></CRIT> <MIN>0</MIN> <MAX></MAX> </DATASOURCE> ... </NAGIOS>
PNP will look for a template with the name check_http.php
in the following sequence:
The template default.php takes an exceptional position as it is used every time no other applicable template is found.
PNP templates are PHP files which are included during execution of PNP using the PHP function include(). This means that every PHP code in templates will be interpreted so manipulation of all values is possible.
PNP template must have the following characteristics:
These two arrays are used to call 'rrdtool graph
' so every option is possible that RRDtool supports. All options of RRDtool are described very thoroughly on the RRDtool Homepage.
If both arrays contain more than one set of data graphs will be created for every set.
Inside the templates the data from the related XML files can be used.
Using the relatively simple template response.php we will describe the most important options.
<?php # $opt[1] = "--title \"Response Time For $hostname / $servicedesc\" "; # $def[1] = "DEF:var1=$rrdfile:$DS[1]:AVERAGE " ; $def[1] .= "AREA:var1#00FF00:\"Response Times \" " ; $def[1] .= "LINE1:var1#000000 " ; $def[1] .= "GPRINT:var1:LAST:\"%3.4lg %s$UNIT[1] LAST \" "; $def[1] .= "GPRINT:var1:MAX:\"%3.4lg %s$UNIT[1] MAX \" "; $def[1] .= "GPRINT:var1:AVERAGE:\"%3.4lg %s$UNIT[1] AVERAGE \" "; ?>
Note: as the number (1) and the letter “L” look alike in this listing: the format ”%3.4lg” contains a small letter.
$opt[1] = ”--title …
sets RRDtool options for the first set of data, here the title as you can see. Embedded quotes are masked using a backslash (\). The variables $hostname
and $servicedesc
were determined through the call of PNP and are available for the template as well.
$def[1] = “DEF:var1=$rrdfile:$DS[1]:AVERAGE ”;
defines which data is to be read from which RRD file. $rrdfile contains the path to the RRD file of this service. $DS[1] refers to the first data series from the RRD file.
$def[1] .= “AREA:var1#00FF00:\”Response Times \” ”;
the operator ”.=” appends more data to the array $def[1]. An area will be drawn using data from the variable var1
. The color is defined in HEX notation #00FF00 (red, green, blue). The label is “Response Times”.
$def[1] .= “LINE1:var1#000000 ”;
As completion of the just drawn area a line (LINE1) will be drawn in black (#000000).
$def[1] .= “GPRINT:var1:LAST:\”%3.4lg %s$UNIT[1] LAST \” ”;
$def[1] .= “GPRINT:var1:MAX:\”%3.4lg %s$UNIT[1] MAX \” ”;
$def[1] .= “GPRINT:var1:AVERAGE:\”%3.4lg %s$UNIT[1] AVERAGE \” ”;
The three GPRINT lines build up the caption for the graph. The current values are formatted using the printf syntax.
Using the data collector process_perfdata.pl
PNP stores not only performance data but other values exported by Nagios. These values are stored in the XML file associated to the appropriate service.
In the first part of the XML file the performance data is stored in separate components.
<NAGIOS> <DATASOURCE> <TEMPLATE>check_http</TEMPLATE> <DS>1</DS> <NAME>time</NAME> <UNIT>s</UNIT> <ACT>0.006721</ACT> <WARN>1.000000</WARN> <CRIT>2.000000</CRIT> <MIN>0.000000</MIN> <MAX></MAX> </DATASOURCE> .... </NAGIOS>
The field <DS> designates the data source and is used to identify the data series of the RRD files and is the key of the following arrays as well.
The array $UNIT[1]
contains the unit of measurement of the first data series.
The XML file contains other information. When process_perfdata.pl is used in default mode all available macros are at hand with the current values. For the benefit of readability the following lines show only an extract.
<NAGIOS> ... <NAGIOS_SERVICENOTIFICATIONID>8418</NAGIOS_SERVICENOTIFICATIONID> <NAGIOS_SERVICENOTIFICATIONNUMBER>0</NAGIOS_SERVICENOTIFICATIONNUMBER> <NAGIOS_SERVICEOUTPUT>HTTP OK HTTP/1.1 200 OK - 10087 bytes in 0.125 seconds</NAGIOS_SERVICEOUTPUT> <NAGIOS_SERVICEPERCENTCHANGE>0.00</NAGIOS_SERVICEPERCENTCHANGE> <NAGIOS_SERVICEPERFDATA>time=0.124811s;;;0.000000 size=10087B;;;0</NAGIOS_SERVICEPERFDATA> <NAGIOS_SERVICEPERFDATAFILE></NAGIOS_SERVICEPERFDATAFILE> <NAGIOS_SERVICEPROBLEMID>0</NAGIOS_SERVICEPROBLEMID> <NAGIOS_SERVICESTATE>OK</NAGIOS_SERVICESTATE> <NAGIOS_SERVICESTATEID>0</NAGIOS_SERVICESTATEID> <NAGIOS_SERVICESTATETYPE>HARD</NAGIOS_SERVICESTATETYPE> <NAGIOS_SHORTDATETIME>27-12-2007 13:51:23</NAGIOS_SHORTDATETIME> ... </NAGIOS>
The various XML fields can be used as variables in the PNP templates. Each field is available as a variable with the same name.
The value of the field <NAGIOS_SERVICEOUTPUT>
is available as the variable $NAGIOS_SERVICEOUTPUT
As already described under ”What are templates ?” the appearance of graphs depends on the check command used.
There are situations where this behaviour must be overruled. This has to be done when universal commands have been defined.
Example:
define command { command_name check_nrpe command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -C $ARG1$ -a "$ARG2$" }
This would lead to a call of the check_nrpe.php template even when the monitored host would use a completely different plugin which is called via NRPE.
PNP, especially process_perfdata.pl, will search for a config file (<check_command>.cfg) in the etc/pnp/check_commands directory and read its contents (if available).
As our example command is called check_nrpe it will be searched for etc/pnp/check_commands/check_nrpe.cfg.
During installation a sample config file with the extension .cfg-sample is copied to etc/pnp/check_commands.
Two options can be set in this config file:
# check_command check_nrpe!load!-w 4,4,4 -c 5,5,5 # ________0__________| | | # ________1__________________| | # ________2__________________________| # CUSTOM_TEMPLATE = 1
CUSTOM_TEMPLATE = 1
assures that only the contents of $ARG1$ will be used as a template name. As $ARG1$ contains “load” in this example the template name would result in “load.php”.
CUSTOM_TEMPLATE = 0,1
results in → “check_nrpe_load.php”
CUSTOM_TEMPLATE = 1,0
results in → “load_check_nrpe.php”
The option “DATATYPE” controls the datatype which is used during creation of the RRD database. Default is “GAUGE”. For consecutive values the type should be “COUNTER”. Plugin-developers should use the unit “c” for counters but this is not always the case.
To set all datasources to COUNTER
DATATYPE = COUNTER
Setting datasources to different types (starting with PNP-0.4.11)
DATATYPE = GAUGE,GAUGE,COUNTER,COUNTER
This option has effect only during creation of the RRD database.
More datatypes are explained in the RRDTool documentation found at rrdcreate.
In a few situations it might be necessary to limit the values which are valid for RRDTool.
RRD databases can be created with fixed minimum and maximum values. You will find further details at http://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html.
Account for the maximum value taken from the performance data (starting with PNP-0.4.13)
USE_MAX_ON_CREATE = 1
Account for the minimum value taken from the performance data (starting with PNP-0.4.13)
USE_MIN_ON_CREATE = 1
If Nagios is implemented as a distributed system you have to decide where PNP should be installed.
From a technical view this question is not important. PNP can be installed on the slave(s) as well as on the master server. Or only on the master?
If PNP is running on the master you have to make sure that data passed via send_nsca from the slave server(s) contains performance data. Often another check command is used on the master.
To help PNP on the master to recognize which check command was used on the slave to collect the information process_perfdata.pl responds to an additional field at the end of the performance data.
OK - 127.0.0.1: rta 2.687ms, lost 0% | rta=2.687ms;3000.000;5000.000;0; pl=0%;80;100;; [check_icmp]
If PNP finds a string enclosed in brackets at the end of performance data it will be recognized as check command and will be used as PNP template.
Nagios documentation related to this topic can be found here. The command used in the documentation can be adapted easily.
define command{ command_name submit_check_result command_line /usr/local/nagios/libexec/eventhandlers/submit_check_result $HOSTNAME$ '$SERVICEDESC$' $SERVICESTATE$ '$SERVICEOUTPUT$' }
should be changed to
define command{ command_name submit_check_result command_line /usr/local/nagios/libexec/eventhandlers/submit_check_result $HOSTNAME$ '$SERVICEDESC$' $SERVICESTATE$ '$SERVICEOUTPUT$ | $SERVICEPERFDATA$ [$SERVICECHECKCOMMAND$]' }
The plugin check_multi is one of the first plugins which uses new features of Nagios 3.x. Check_multi can execute multiple Nagios plugins but returns only results like a single service. The output of check_multi comprises of several lines to be able to display the amount of information.
This results in some difficulties for PNP which has to extract the information of several plugins from the performance data. Together with Matthias Flacke, developer of check_multi, we have found a solution to assign the data to the appropriate plugins.
In big installations sooner or later one will recognize that processing the performance data will result in a relatively high I/O load. RRDtool has to do very much disk updates but cannot use the disk cache in an optimal way.
One improvement is made by collecting and sorting the data. It is more effective to write many updates to an RRD database in one block. The disk cache can be used more effectively that way.
The current RRDtool ( SVN trunk 1550+ ) contains rrdcached which should improve exactly this situation.
At this point I'd like to thank Florian octo Forster, Kevin Brintnall and Tobi Oetiker. The development of this daemon has been coordinated exemplary on the rrd-developers mailing list.
The rrdcached is working as a daemon in the background and opens a UNIX or TCP socket to wait for requests of rrdtool.
rrdcached recognizes some important options which are passed during startup.
Option -l defines the socket the daemon will listen for update requests. The default TCP port will be 42217.
-l unix:/path/to/rrdcached.sock -l 127.0.0.1 -l 127.0.0.1:8888
Option -L is an unprivileged socket which only triggers the FLUSH command to write to the RRD databases using the daemon.
-L 127.0.0.1
Option -w specifies the interval (in seconds) the data will be written to disk.
-w 1800
Option -z defines a maximum delay which will be used to spread the write cycles over a certain range [0-delay] to avoid parallel write accesses. The value of option -z must not be larger than -w.
-z 1800
Option -p defines a PID file
-p /var/run/rrdcached.pid
Option -j defines the path to a journaling directory. All requests will be logged there so that they can be processed after a restart in case the daemon crashes.
-j /var/cache/rrdcached
These options may result in a call of rrdcached with the following parameters
rrdcached -w 1800 -z 1800 -p /tmp/rrdcached.pid -j /tmp -l 127.0.0.1
RRDtool itself will be informed about the daemon using the option --daemon=<socket>.
rrdtool --daemon=127.0.0.1 update ...
Of course this has to correspond with the options of rrdcached!
Because two components of PNP have to prepared for the use of rrdcached there are changes in two config files.
1. Adjustment of process_perfdata.cfg for the data collector process_perfdata.pl
# EXPERIMENTAL rrdcached Support # Use only with rrdtool svn revision 1511+ # RRD_DAEMON_OPTS = 127.0.0.1:8888
2. Adjustment of config_local.php for the web interface
# # EXPERIMENTAL rrdcached Support # Use only with rrdtool svn revision 1511+ # # $conf['RRD_DAEMON_OPTS'] = 'unix:/tmp/rrdcached.sock'; $conf['RRD_DAEMON_OPTS'] = '127.0.0.1:8888';
Starting with PNP-0.4.11 the sample files contain the relevant options.
NPCD (Nagios-Perfdata-C-Daemon) was written to provide an asynchronous mode to handle performance data with nagios
.
In large nagios installations, your average check latency may increase to a non-acceptable high value. This means that nagios should do a check at time x
but actually does it y
seconds later.
If you tell the nagios core that you want to process the performance data after every single check this is doing well for a certain amount of checks but above this limit you will run into latency problems.
To reduce the number of actions for each check you can use the Bulk Mode which gathers performance data for some time and then lets the nagios core
execute the <host|service>_perfdata_file_processing_command
or you can tell nagios to just move the perfdata_files
to a spool directory.
This move is a very fast action for the nagios core
and the core
will be done with the processing of performance data and can continue to do what it should do: execute other checks, sending notifications, and so on.
As mentioned above the nagios process has finished its work with moving the performance data file to a spool directory but this won't bring the data into the RRD files.
For this task you can start npcd
to have a look at the defined spool directory and start an action for every file which is found.
After NPCD starts running it will build a list of filenames found in perfdata_spool_dir
and starts new threads for every filename and executes the perfdata_file_run_cmd
with the optional perfdata_file_run_cmd_arg
as an additional argument.
Since the perfdata files in the spool dir are in the same format as for the 'normal' bulk mode NPCD should execute process_perfdata.pl
in Bulk Mode.
Pro:
nagios core
it has more time for its own work.nagios
writes perfdata files to the spool dir your data won't get lost if NPCD dies or you forgot to start it after a system reboot. NPCD will start with the first file found (they are sorted by the $TIME_T$ macro in chronological order) and update your RRD Files.Con:
nagios
(service_perfdata_file_processing_interval
)
You have to control NPCD with its own configuration file like the rolled out npcd.cfg-sample
file.
Just rename it to npcd.cfg
to start NPCD like this:
/usr/local/nagios/bin/npcd -f /usr/local/nagios/etc/pnp/npcd.cfg
or
/usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
to run in Daemon Mode (Background).
Hint: If you decide to not rename the config file, it might be overwritten by a future update of PNP.
These are the essential configuration directives for NPCD:
# Privilege Options user = nagios group = nagios # Logging Options log_type = syslog log_file = /usr/local/nagios/var/npcd.log max_logfile_size = 10485760 log_level=0 # Processing Options perfdata_spool_dir = /usr/local/nagios/var/spool/perfdata/ perfdata_file_run_cmd = /usr/local/nagios/libexec/process_perfdata.pl perfdata_file_run_cmd_args = -b # Thread Options npcd_max_threads=5 # greedy options use_load_threshold = 0 load_threshold = 10.0 # Process Options pid_file=/var/run/npcd.pid
log_type = file
this will be the logfile used<perfdata_file_run_cmd> <perfdata_file_run_cmd_args> <filename_from_perfdata_spool_dir>
use_load_threshold
is set to 1 this load limit must not be exceeded