In the first part of this
article we've seen what Nagios is and
how we can install Nagios and its plugins. We also have briefly looked at what
configuration files are necessary and how to install the sample configuration
files. Now we will take a look at each configuration file one by one and
configure one host 'example.com' and two services on it
'http' and 'ping' to be monitored. If something goes
wrong with these services, two users 'oktay' and
'verty' will be notified.
We first need to add our host definition and configure some options for that host. You can add as many hosts as you like, but we will stick with one host for simplicity.
hosts.cfg# Generic host definition template
define host{
# The name of this host template - referenced i
name generic-host
n other host definitions, used for template recursion/resolution
# Host notifications are enabled
notifications_enabled 1
# Host event handler is enabled
event_handler_enabled 1
# Flap detection is enabled
flap_detection_enabled 1
# Process performance data
process_perf_data 1
# Retain status information across program restarts
retain_status_information 1
# Retain non-status information across program restarts
retain_nonstatus_information 1
# DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST,
# JUST A TEMPLATE!
register 0
}
# Host Definition
define host{
# Name of host template to use
use generic-host
host_name example.com
alias An Example Domain
address www.example.com
check_command check-host-alive
max_check_attempts 10
notification_interval 120
notification_period 24x7
notification_options d,u,r
}
The first host defined is not a real host but a template which other host definitions are derived from. This mechanism can be seen in other configuration files also and makes configuration based on a predefined set of defaults a breeze.
With this setup we are monitoring only one host ,
'www.example.com' to see if it is alive. The
'host_name' parameter is important because this server will be
referred to by this name from the other configuration files.
Now we need to add this host to a hostgroup. Even though we will keep the configuration simple by defining a single host, we still have to associate it with a group so that the application knows which contact group (see below) to send notifications to.
hostgroups.cfgdefine hostgroup{
hostgroup_name flcd-servers
alias The Free Linux CD Project Servers
contact_groups flcd-admins
members example.com
}
Above, we have defined a new hostgroup and associate the 'flcd-admins' contact group with it. Now let's look into the contactgroup settings.
contactgroups.cfgdefine contactgroup{
contactgroup_name flcd-admins
alias FreeLinuxCD.org Admins
members oktay, verty
}
We have defined the contact group 'flcd-admins' and added two members 'oktay' and 'verty' to this group. This configuration ensures that both users will be notified when something goes wrong with a server that 'flcd-admins' is responsible for. (Individual notification preferences can override this). The next step is to set the contact information and notification preferences for these users.
|
Related Reading The Networking CD Bookshelf |
|
contacts.cfgdefine contact{
contact_name oktay
alias Oktay Altunergil
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-email,notify-by-epager
host_notification_commands host-notify-by-email,host-notify-by-epager
email oktay@example.com
pager dummypagenagios-admin@localhost.localdomain
}
define contact{
contact_name Verty
alias David 'Verty' Ky
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-email,notify-by-epager
host_notification_commands host-notify-by-email
email verty@example.com
}
In addition to providing contact details for a particular user, the 'contact_name' in the contacts.cfg is also used by the cgi scripts (i.e the Web interface) to determine whether a particular user is allowed to access a particular resource. Although you will need to configure .htaccess based basic http authentication in order to be able to use the Web interface, you still need to define those same usernames as seen above, before the users can access any of the resources even after they are logged in with their username and passwords. Now that we have our hosts and contacts configured, we can start configuring individual services on our server to be monitored.
services.cfg# Generic service definition template
define service{
# The 'name' of this service template, referenced in other service definitions
name generic-service
# Active service checks are enabled
active_checks_enabled 1
# Passive service checks are enabled/accepted
passive_checks_enabled 1
# Active service checks should be parallelized
# (disabling this can lead to major performance problems)
parallelize_check 1
# We should obsess over this service (if necessary)
obsess_over_service 1
# Default is to NOT check service 'freshness'
check_freshness 0
# Service notifications are enabled
notifications_enabled 1
# Service event handler is enabled
event_handler_enabled 1
# Flap detection is enabled
flap_detection_enabled 1
# Process performance data
process_perf_data 1
# Retain status information across program restarts
retain_status_information 1
# Retain non-status information across program restarts
retain_nonstatus_information 1
# DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
register 0
}
# Service definition
define service{
# Name of service template to use
use generic-service
host_name example.com
service_description HTTP
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
contact_groups flcd-admins
notification_interval 120
notification_period 24x7
notification_options w,u,c,r
check_command check_http
}
# Service definition
define service{
# Name of service template to use
use generic-service
host_name example.com
service_description PING
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
contact_groups flcd-admins
notification_interval 120
notification_period 24x7
notification_options c,r
check_command check_ping!100.0,20%!500.0,60%
}
Using the above setup, we are configuring two services to be monitored. The first service definition, which we have called HTTP, will be monitoring whether the Web server is up and notifies us if there's a problem. The second definition monitors the ping statistics from the server and notifies us if the response time increases too much and if there's too much packet loss which is a sign of network trouble. The commands we use to accomplish this are 'check_http' and 'check_ping' which were installed into the 'libexec' directory when we installed the plugins. Please take your time to get familiar with all other plugins that are available and configure them similarly to the above definitions. You can also write your own plugins to do custom monitoring. For instance, there's no plugin to check if Tomcat is up or down. You could simply write a script that loads a default jsp page on a remote Tomcat server and returns a success or failure status based on the presence or lack of a predefined text value (i.e "Tomcat is up") on the page. (In such a case you would need to add a definition for this custom command in your checkcommand.cfg file which we have not touched)
|
Now that we have configured the hosts and the services to monitor, we are ready to fire up Nagios and start monitoring. We will start Nagios using the init script that we had installed earlier.
root@ducati:/usr/local/nagios/etc# /etc/rc.d/rc.nagios start
Starting network monitor: nagios
/bin/bash: -l: unrecognized option
[ ... ]
If you receive the above error message, it means the 'su' command installed on your server does not support the '-l' option. To fix it, open up /etc/rc.d/rc.nagios (or its equivalent on your system) and remove the 'l' where it says 'su -l'. You will end up with 'su -' which means the same thing. After making the change, run the above startup command again. If you receive 'permission denied' errors. Just reset the ownership information on your Nagios installation directory and it will be resolved.
root@ducati:/usr/local/nagios# chown -R nagios /usr/local/nagios
root@ducati:/usr/local/nagios# chgrp -R nagios /usr/local/nagios
If everything went smoothly, Nagios should now be running. The following command will show you whether Nagios is up and running and the process ID associated with it, if it is indeed running.
root@ducati:/usr/local/nagios# /etc/rc.d/rc.nagios status
PID TTY TIME CMD
22645 ? 00:00:00 nagios
The same command will stop Nagios when called with the 'stop' paramter instead of 'start' or 'status'.
Although Nagios has already started monitoring and is going to send us the notifications if and when something goes wrong, we need to set up the Web interface to be able to interactively monitor services and hosts in real time. The Web interface also gives a view of the big picture by making use of graphics and statistical information.
Sure enough, we need to have a Web server already set up in order to be able to access the Nagios Web interface. For this article we will assume that we are running the Apache Web server. I will use the exact same configuration that is included in the official Nagios documentation because it works fine.
httpd.conf
ScriptAlias /nagios/cgi-bin/ /usr/local/nagios/sbin/
<Directory "/usr/local/nagios/sbin/">
AllowOverride AuthConfig
Options ExecCGI
Order allow,deny
Allow from all
</Directory>
Alias /nagios/ /usr/local/nagios/share/
<Directory "/usr/local/nagios/share">
Options None
AllowOverride AuthConfig
Order allow,deny
Allow from all
</Directory>
This configuration creates a Web alias '/nagios/cgi-bin/' and directs it to the cgi scripts in your Nagios 'sbin' directory. Assuming your main Web site is set up at http://127.0.0.1, you will be able to access the Nagios Web interface at http://127.0.0.1/nagios/ . At this point, the Nagios Web interface should come up properly, but you will notice that you cannot access any of the pages. You will get an error message that looks like the following.
It appears as though you do not have permission to view information for any of the hosts you requested... If you believe this is an error, check the HTTP server authentication requirements for accessing this CGI and check the authorization options in your CGI configuration file.
This is a security precaution that is designed to only allow authorized people to be able to access the monitoring interface. The authentication is handled by your Web server using Basic HTTP Authentication (i.e. .htaccess). Nagios then uses the credentials for the user who has logged in and matches it with the contacts.cfg contact_name entries to determine which sections of the Web interface the current user can access.
Configuring .htaccess based authentication is easy provided that your Web server is already configured to use it. Please refer to the documentation for your Web server if it's not configured. We will assume that our Apache server is configured to look at the .htaccess file and apply the directives found in it.
First, create a file called .htaccess in the /usr/local/nagios/sbin directory. If you would like to lock up your Nagios Web interface completely, you can also put a copy of the same file in the /usr/local/nagios/share directory.
Put the following in this .htaccess file.
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
require valid-user
When you're adding your first user, the password file that .htaccess refers to will not be present. You need to run the 'htpasswd' command with the -c option to create the file.
htpasswd -c /usr/local/nagios/etc/htpasswd.users oktay
New password: ******
Re-type new password: ******
Adding password for user oktay
For the rest of your users, use the 'htpasswd' command without the '-c' option so as not to overwrite the existing one. After you add all of your users, you can go back to the Web interface which will now pop up an authentication dialog. Upon successful authentication, you can start using the Web interface. I will not go into detail about using the Web interface since it's pretty self explanatory. Notice that your users will only be able to access information for servers that they are associated with in the Nagios configuration files. Also, some sections of the Web interface will be disabled for everyone by default. If you would like to enable those, take a look at 'etc/cgi.cfg'. For instance, in order to allow the user 'oktay' to access the 'Process Info' section, uncomment the 'authorized_for_system_information' line and add 'oktay' to the list of names delimited by commas.
This is all you need to install and configure Nagios to do basic monitoring of your servers and individual services on these servers. You can then fine tune your monitoring system by going through all of the configuration files and modifying them to match your needs and requirements. Going through all plugins in the libexec directory will also give you a lot of ideas about what local and remote services you can monitor. Nagios also comes with software that can be used to monitor a server's disk and load status remotely. Finally, Nagios comes with so many features that no single article could explain all of it. Please refer to the official documentation for more advanced topics that aren't covered here.
Happy hacking.
Official Nagios Web Site: http://www.nagios.org
Official NetSaint Web Site: http://www.netsaint.org
Nagios Plugins: http://nagiosplug.sourceforge.net
Nagios ScreenShots: http://www.nagios.org/screenshot.php
htpasswd man Page: http://www.rt.com/man/htpasswd.1.html
Oktay Altunergil works for a national web hosting company as a developer concentrating on web applications on the Unix platform.
Return to ONLamp.com.
Copyright © 2009 O'Reilly Media, Inc.