In the administration of any system, a very important part is having a monitoring and alerting system for any relevant incident. There are currently multiple open source systems created and designed for this purpose, some simple and others very complex with different approaches and purposes. Monitoring homogeneous servers is not the same as monitoring routers via SNMP or a mix of different OSs, clusters, clouds…
One of my favorites is monit for Linux/*BSD servers. It is a “simple” program written in C lightweight by definition with versatility that is hard to match by other “agents” with a KISS conception that won me over from the start and that with imagination allows you to have the perfect watchdog for your server for any event you can imagine.
The ideal complement to monit is M/Monit. A centralized panel to which monit can, if you configure it, send all its metrics and alerts and in a unified way have a view of all your systems. The only downside of M/Monit is that it is not open source and requires a license, but it is one of the few non-open source software I can recommend and that is worth acquiring its license, both for its relatively low cost and for its great added value. Additionally they are the same developers as monit, which is open source, and by itself is already an essential tool. By acquiring the M/Monit license you help improve monit.
But let us focus on “monit” since in many cases it will be sufficient for small infrastructures. Its advantage as I mentioned is its lightness as a running process, being written in C it is very lightweight in consumption, ideal even for embedded systems. It is autonomous and operates while also offering a small web panel that shows the status of the different monitored services/events and allows you to perform certain actions such as stopping or restarting services. Its configuration, although it may initially seem intimidating, is very well thought out and allows monitoring anything you can imagine on your system since in addition to the built-in “checks” it allows you to use any external program as a source of events to monitor, this possibility together with shell scripts gives you options where the only limit is your imagination.
Let us look at a practical case on a FreeBSD server.
To perform the installation we can use the following simple Ansible playbook:
- name: Install monit
pkgng: name=monit state=present
- name: create monit directories
file: path={{item}} mode=0700 state=directory
with_items:
- /usr/local/etc/monit
- /usr/local/etc/monit/scripts
- /usr/local/etc/monit/conf.d
- /var/lib/monit
- /var/lib/monit/events
- name: monit logrotation
copy:
src: newsyslog.conf
dest: /usr/local/etc/newsyslog.conf.d/monit.conf
mode: 0600
- name: copy monit conf
template: src=monitrcdest=/usr/local/etc/monitrcowner=0 group=0 mode=0600
- name: enable monit on boot
service: name=monit enabled=yes state=started
The content of the basic monitrc template:
set daemon 10 with start delay 10
set logfile /var/log/monit.log
set pidfile /var/run/monit.pid
set idfile /var/lib/monit/id
set statefile /var/lib/monit/state
set mailserver 127.0.0.1
set alert YOUR@EMAIL.COM not on { instance, action }
set httpd unixsocket /var/run/monit.sock
uid root
gid wheel
permission 0600
allow root:CHANGEMEPASS
set eventqueue
basedir /var/lib/monit/events
slots 100
include /usr/local/etc/monit/conf.d/*
check system $HOST
if memory usage > 90% for 10 cycles then alert
if swap usage > 20% for 10 cycles then alert
if cpu usage (user) > 80% for 10 cycles then alert
if cpu usage (system) > 30% for 10 cycles then alert
if cpu usage (wait) > 10% for 10 cycles then alert
The file /usr/local/etc/newsyslog.conf.d/monit.conf:
/var/log/monit.log root:wheel 640 7 * $W1D01 JC /var/run/monit.pid
As you will have noticed, you need an active SMTP server to send alerts, this is normally installed by default on any Unix server even if only for log sending. You must also indicate the email where alerts will be sent and set a user/password that the monit client in console will use to perform operations with the monit daemon itself (monit is both the server and the local CLI client).
With this simple basic configuration we already have our server memory and CPU monitored with alerts. The check is performed at 10s intervals (set daemon 10 with start delay 10).
In this configuration the HTTP port of the built-in web server is disabled, I handle things much better in the console than with a web panel and I prefer, if not necessary, to avoid having any TCP port open, especially if it runs with root privileges even though it is in principle a secure and well-programmed application (everything is susceptible to having a hidden exploit).
From here the options included in monit itself are multiple and I strongly recommend reviewing its documentation (https://mmonit.com/monit/documentation/monit.html) and its wiki with examples (https://mmonit.com/wiki/Monit/ConfigurationExamples)
If you want to connect monit with M/Monit you just have to indicate in its configuration a line like the following:
set mmonit https://USER:PASS@MONITSERVER:25083/collector
With this (and having created the corresponding user/password in M/Monit), in addition to notifications and alerts you will have available a panel with versatile and useful graphs and history, especially if you want to see the evolution of a system over time and compare it with other servers. If you do not want to use M/Monit and only use monit alerts you can always resort to another simple tool like Munin for graphs installed on the local server itself but that is another story.
Let us now look at another example of the versatility of monit, although by default it does not have a “check” to verify CPU temperature we can create an alert very simply with a shell script. In the case of this FreeBSD server we will use the hwstats application and this simple script that we will create in /usr/local/etc/monit/scripts/cpu1temp.sh:
#!/bin/sh
TEMP=`/usr/local/bin/hwstat |grep CPU10|awk '{print $2}'|cut -d '.' -f 1`
echo $TEMP
exit $TEMP
In monit we will add the following check:
check program CPUTemp1 with path "/usr/local/etc/monit/scripts/cpu1temp.sh" timeout 2 seconds
if status > 80 for 3 cycles then alert
In this way, every 10s, the CPU temperature will be verified and if for 30s (3 cycles) the CPU temperature is greater than 80 degrees an alert will be sent.
Easy right?