Nagios and monitoring syslog from a loghost

Anonymous Engineer

Monitoring with Nagios is easy, it is modular and many checks are available like check_ping, check_http, check_mysql, etc. With a module like NRPE you can even send a request using check_nrpe from the Nagios server to a remotely running NRPE daemon to return the capacity of a filesystem or other checks that require you to be logged in to the remote server. That way you could grep for a string in logfile, but for that there are better and more flexible ways.

Loganalyzers (logsurfer)

There are many loganalyzers and fancy tools, like Splunk or Logstash, that can put logging in a database and index it. There are also Opensource tools like Logsurfer which is small, fast and just does what it’s meant to do. In large datacentre’s or cluster environments where you might have setup a central syslog server to analyze the logfiles or just run it on a critical server and check for http logging or database logging for example. It’s possible to build an IDS with Logsurfer as well

Logsurfer generic information

The homepage of logsurfer is:
It’s out of scope to explain the exact working and configuration of logsurfer, so here is a good starting point and explanation of configuration lines:

The installation requires some work, like there is no ready to go init script after runnign “make install” and there isn’t a .spec file available (for the Redhat linux derivates).

Logsurfer can be run in two ways, just in run analyzing a logfile or run it as a daemon.

To configure Logsurfer it is required to know about regular expressions. More on that can be found here:

A simple rule to catch and (seperately) mail all lines containing “Warning” OR “warning” from a log file would look like this:

'(Warning|warning)' - - - 0
    pipe "/usr/local/bin/start-mail \"Warning message\"" "$0" 

Logsurfer configuration to alert on syslog lines on a loghost

When you have a Nagios server and on another server a Loghost it would be nice to be able to tell Nagios when there is some critical alert. The Nagios server side has to be configured and the Loghost side.

Nagios server side:

A PASSIVE service check in the Nagios configuration has to be setup. The service description name is important to log against. Let’s say LOG_ERROR

To receive PASSIVE check alerts NSCA will be used, so the NSCA daemon has to be configured and started. It’s important to have all possible hosts logging to the loghost defined with this service check, because when the host in combination with the service check was not defined, than Nagios can receive the alert, but it won’t know what to do with it.

Loghost side:

To send an alert, the send_nsca binary has to be installed and configured with a definition where to find the Nagios server, what password and which encryption method will be used.

Logsurfer has be be configured and the alert needs to be send which has to be done via a wrapper script (the “exec” action available in logsurfer is a bit confusing probably).

The wrapper script should contain something like this (

# usage: send-alert    
/bin/echo "$1 $2 $3 $4"| /usr/bin/send_nsca -H   -c /etc/nagios/send_nsca.cfg -d " "

Some explanation on this scripts example. The parameters $1 $2 $3 $4 are coming from logsurfer from where the script will be run. /etc/nagios/send_nsca.cfg is the standard send_nsca configurartion file, but might be found somewhere else on the filesystem after installation.

The logsurfer.conf example:

'^.{16}([^ ]+) kernel: (error: .*)' - - - 0
    exec "/usr/local/bin/ $2 LOG_ERROR 2 \"$3\""

Some explanation on the parameters set after calling the wrapper script

A. The first 16 characters are for the timestamp.

B. $0 and $1 are already used by logsurfer, representing the logline, so $2 is the first positional parameter when you try to catch and save strings into a positional parameter.

C. The round brackets “save” the hostname ($2) represented by “([^ ]+)” (without the double quotes) and later the alert message ($3). Save this hostname is important on a loghost, but not for a single servers syslog file of course

D. LOG_ERROR was the service description defined in the Nagios configuration

E. The number 2 is the alert level in Nagios (0=OK; 1=WARN; 2=CRIT; 3=UNKNOWN)

F. $3 is between escaped double quotes, because the error message contains blanks.

Of course it makes more sense to filter for a better specified error message this was just to show how this works.


you might also like