On Wed, Jul 11, 2007 at 02:01:13PM +0200, Thomas Kaehn wrote:
But is there also a proper way in Hobbit to take action on failed processes?
No. Hobbit only monitors things, it doesn't act to recover from any failures.
If You really want this, then the easiest way is probably to have a script on the Hobbit server that handles the service restart, and trigger it from an alerting script. Here's how:
First, setup monitoring of the "sshd" process in hobbit-clients.cfg with PROC sshd GROUP=ssh You need the "GROUP" setting to be able to distinguish between different types of "procs" alerts.
Next, create /usr/local/bin/sshRecover.sh with the commands needed to restart ssh - you can use $BBHOSTNAME to get the name of the host that has the problem.
Finally, in hobbit-alerts.cfg you should have HOST=hostA,hostB,hostC SERVICE=procs GROUP=ssh SCRIPT /usr/local/bin/sshRecover.sh 0 to trigger the sshRecover.sh script when the "procs" column goes red due to the "sshd" process missing. The "0" at the end is a mandatory parameter in hobbit-alerts.cfg (the "recipient" if you read the man-page) but here it's just a dummy parameter.
Regards, Henrik