Howto Maintain a large number of sites centralized with Jabber.

Víctor Fariña Projects, Technology

network map
Page of the project
The problem I find at my work was the next :

The problem:

When you maintain and update a large number of unix (or windows) machines you must review if all machines are UP, and in case of failure you must do something (well it depends on admin 😉 ), also you would like to review if all goes well before a weekend ( admins tends to me very vagues).

The solution:

The architecture of the application is simple:
you must rely on a machine (the central server or backup server) here you install NAGIOS (http://www.nagios.org/download/), this is a program in C that monitors the health of each server. You must configure a lot of archives (it takes a half day or so but its nice) in which indicate what servers to monitor and wich service offers each server, the alerts, the methos of communication and so on …
Ths next step (when you install nagios and test it) is to install sendJabber ( only on CVS at the moment) configure it with a valid account on a jabber server and tell him the user/users to send alerts.
The next is configure again Nagios to send alerts throw sendJabber its so simpe sa adding a line to misccommands.cfg , then you must restart Nagios and all is OK.
Just sit in front your computer and if something fails you must receive an alert on your Jabber client (I recommend GAIM )

I will post a more extense howto on the next days.

TODO:

On future releases onf sendJabber you can intereact with NAGIOS server, send commands and receive status about servers.
and more …