Monitoring: Nagios

Monitoring Basics

What types of resources should we monitor?

What is Nagios?

History of Nagios

Components of Nagios

Core server process
  • Core logic for monitoring
  • Keeps track of service states
  • Starts service checks
CGI Web interface
  • Simple web interface which connects to the core server process via sockets

Components of Nagios

Plugins
  • Scripts written to gather monitoring information
  • Typically written in Perl, but can be written in about anything
  • Has an API that you follow to create your own plugin
NRPE/NSCA
  • Daemons that handle remote checks
  • NRPE: Active checking daemon
  • NSCA: Passive checking daemon (just listens for data) The client must:
    • Run the check (and schedule it)
    • Send the data to NSCA using send_nsca

Passive vs. Active

_images/nrpe.png
_images/nsca.png

Images from nagios.org documentation site

Active: NRPE

_images/activechecks.png

Problems with Active checks

What kind of problems would we have?

Passive: NSCA

_images/passivechecks.png

When are Passive checks useful?

CheckMK

_images/check_mk.png

Check_MK is an extension to Nagios that allows more flexibility checking servers.

CheckMK Architecture

Plugins

# Install EPEL repo first!
$ yum install nrpe nagios-plugins*
$ cd /usr/lib64/nagios/plugins
$ ./check_ssh localhost
SSH OK - OpenSSH_6.6.1 (protocol 2.0) | time=0.188930s;;;0.000000;10.000000

$ ./check_disk -w 15% -c 10%
DISK OK - free space: / 8223 MB (85% inode=92%); /dev 235 MB (100% inode=99%);
/dev/shm 244 MB (100% inode=99%); /run 240 MB (98% inode=99%); /sys/fs/cgroup
244 MB (100% inode=99%); /run/user/1000 48 MB (100% inode=99%);|
/=1376MB;8539;9041;0;10046 /dev=0MB;199;211;0;235 /dev/shm=0MB;207;219;0;244
/run=4MB;207;219;0;244 /sys/fs/cgroup=0MB;207;219;0;244
/run/user/1000=0MB;40;43;0;48

$ ./check_http -H osuosl.org
HTTP OK: HTTP/1.1 200 OK - 40668 bytes in 0.013 second response time
| time=0.013421s;;;0.000000 size=40668B;;;0

NRPE Configuration

# /etc/nagios/nrpe.conf on the remote host
command[check_users]=/usr/lib64/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib64/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% \
  -p /dev/hda1

# Command ran on the nagios server
check_nrpe -H remotehost.example.org -c check_load

# Testing it on a local machine
$ systemctl start nrpe
$ /usr/lib64/nagios/plugins/check_nrpe -H 127.0.0.1 -c check_load
OK - load average: 0.04, 0.13, 0.07|load1=0.040;15.000;30.000;0;
load5=0.130;10.000;25.000;0; load15=0.070;5.000;20.000;0;

Nagios Configuration Overview

_images/nagiosconfig.png

Nagios configuration visualized

Nagios Config components

Main configuration file
  • Configures how the daemon operates
Resource file(s)
  • User defined macros (i.e. notification commands)
Object definition files
  • Define hosts, services, hostgroups, contacts, contactgroups, commands
CGI configuration file
  • How the web interface is setup

Main configuration file

/etc/nagios/nagios.cfg

Main configuration file options

log_file=/var/log/nagios/nagios.log
cfg_file=/etc/nagios/objects/commands.cfg
cfg_file=/etc/nagios/objects/contacts.cfg
cfg_file=/etc/nagios/objects/timeperiods.cfg
cfg_file=/etc/nagios/objects/templates.cfg
cfg_file=/etc/nagios/objects/localhost.cfg
cfg_dir=/etc/nagios/conf.d
status_update_interval=10

Resource configuration file(s)

/etc/nagios/private/resource.cfg

# Sets $USER1$ to be the path to the plugins
$USER1$=/usr/lib64/nagios/plugins
# Sets $USER2$ to be the path to event handlers
#$USER2$=/usr/lib64/nagios/plugins/eventhandlers
# Store some usernames and passwords (hidden from the CGIs)
#$USER3$=someuser
#$USER4$=somepassword

Object Configuration Overview

/etc/nagios/objects

Objects Defined

Hosts
Central object in monitoring logic and are usually physical devices, have an IP address, have one more services assigned to it and can have a parent/child relationship with other hosts configured.
Host Groups
Groups of one or more hosts. Groups can make it easier to view the status of related hosts from the web interface and simplify the configuration.
Services
Another central object in monitoring logic and are associated with hosts. They can be attributes of a host (CPU load, disk usage, etc.), services provided by the host (HTTP, SSH, etc.), and other things associated with the host (DNS records, etc.)

Objects Defined

Service Groups
Groups of one ore more services. Service groups can make it easier to view status of related services in the web interface and simplify the configuration.
Contacts
People involved in the notification process. Contacts have one or more notification methods (cell phone, pager, email, etc.), and receive notifications for hosts and services they are responsible for.
Contact Groups
Groups of one or more contacts. Contact groups can make it easier to define all the people who get notified when certain host or service problems occur.

Objects Defined

Time Periods
Used to control when hosts and services can be monitored and when contacts can receive notifications.
Commands
Used to tell Nagios what programs, scripts, etc. it should execute to perform its tasks. Tasks may include host and service checks, notifications, event handlers and much more.

Object Definition Examples

# Host definition
define host {
  host_name      foo
  alias          foo.example.org
  address        10.0.0.100
  use            generic-host
  hostgroups     nrpe-hosts,ping-hosts
  contact_groups admins
}

# Host Group definition
define hostgroup {
  hostgroup_name  example-servers
  alias           Example Servers
  members         foo,bar
}

# Service definition
define service {
  use                 generic-service
  hostgroup_name      nrpe-hosts
  service_description SSH
  check_command       check_ssh
}

# Contact definition
define contact {
  contact_name  nagiosadmin
  use           generic-contact
  alias         Nagios Admin
  email         nagios@example.org
}

# Contact group definition
define contactgroup {
  contactgroup_name admins
  alias             Nagios Admins
  members           nagiosadmin
}

# 'workhours' Time Period definition
define timeperiod {
  timeperiod_name workhours
  alias           Normal Work Hours
  monday          09:00-17:00
  tuesday         09:00-17:00
  wednesday       09:00-17:00
  thursday        09:00-17:00
  friday          09:00-17:00
}

# Command definition
define command {
  command_name    check_ping
  command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w \
    $ARG1$ -c $ARG2$ -p 5
}

CGI Configuration File

/etc/nagios/cgi.cfg

main_config_file=/etc/nagios/nagios.cfg
physical_html_path=/usr/share/nagios/html
url_html_path=/nagios
use_authentication=1
authorized_for_system_information=nagiosadmin
authorized_for_configuration_information=nagiosadmin
authorized_for_system_commands=nagiosadmin
authorized_for_all_services=nagiosadmin
authorized_for_all_hosts=nagiosadmin
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin

Installing Nagios

$ yum install epel-release
$ yum install nagios
$ systemctl start nagios httpd

Resources