Well, we think we've solved the problem.
The man page for hobbit-alerts.cfg shows that the SCRIPT commands takes two parameters, the path to the script and the RecipientID, and I had left out the RecipientID (so probably SCRIPT was using the "SERVICE=disk,conn" as the second argument, which is why it was being ignored as an alert criteria)
Doesn't really matter what I use for RecipientID, as my script doesn't use it, but putting any old string there works. The SCRIPT is now only firing off when the SERVICE matches the actual alert.
My apologies for not reading the man pages well enough. I'm only posting the solution in case someone else someday finds the original question in the archives.
-----Original Message----- From: Ron Rosenkoetter Sent: Thursday, July 24, 2008 2:08 PM To: 'hobbit at hswn.dk' Subject: SCRIPT alert problem with SERVICE=
I'm having problems understanding how matching is done by hobbit for sending alerts via the SCRIPT command. I have separated out the services I want to monitor into two different SCRIPT events, because we want to monitor disk and connection alerts 24/7, but cpu, mem, tbntp, and procs only during the day.
Here's the appropriate part of the hobbit-alerts.cfg file
PAGE=KC/KC-Test SCRIPT /home/hobbit/server/ext/mail-format-alert.py SERVICE=disk,conn DURATION>1m REPEAT=20 COLOR=yellow,red RECOVERED TIME=W:0000:2359 SCRIPT /home/hobbit/server/ext/mail-format-alert.py SERVICE=cpu,mem,tbntp,procs DURATION>1m REPEAT=20 COLOR=yellow,red RECOVERED TIME=W:0700:1700 MAIL ron at whatever.com SERVICE=cpu,mem,tbntp,procs DURATION>1m REPEAT=20 COLOR=yellow,red RECOVERED TIME=W:0000:2359 MAIL ron at whatever.com SERVICE=conn,disk DURATION>1m REPEAT=20 COLOR=yellow,red RECOVERED TIME=W:0700:1700
However, whenever hobbit generates ANY alert it matches on both SCRIPT lines, and sends the alert twice. It doesn't appear to care about the SERVICE parameter.
Here's the results I get when running a test alert
../bin/bbcmd hobbitd_alert --test test2003 cpu 500 00016539 2008-07-24 13:52:16 send_alert test2003:cpu state Paging 00016539 2008-07-24 13:52:16 Matching host:service:page 'test2003:cpu:KC/KC-Test' against rule line 124 00016539 2008-07-24 13:52:16 *** Match with 'PAGE=KC/KC-Test' *** 00016539 2008-07-24 13:52:16 Matching host:service:page 'test2003:cpu:KC/KC-Test' against rule line 125 00016539 2008-07-24 13:52:16 *** Match with 'SCRIPT /home/hobbit/server/ext/mail-format-alert.py SERVICE=conn,disk DURATION>1m REPEAT=20 COLOR=yellow,red RECOVERED TIME=W:0000:2359' *** 00016539 2008-07-24 13:52:16 Script alert with command '/home/hobbit/server/ext/mail-format-alert.py' and recipient SERVICE=conn,disk 00016539 2008-07-24 13:52:16 Matching host:service:page 'test2003:cpu:KC/KC-Test' against rule line 126 00016539 2008-07-24 13:52:16 *** Match with 'SCRIPT /home/hobbit/server/ext/mail-format-alert.py SERVICE=cpu,mem,tbntp,procs DURATION>1m REPEAT=20 COLOR=yellow,red RECOVERED TIME=W:0700:1700' *** 00016539 2008-07-24 13:52:16 Script alert with command '/home/hobbit/server/ext/mail-format-alert.py' and recipient SERVICE=cpu,mem,tbntp,procs
Note that it works properly with MAIL; the service doesn't match in line 128 and hobbit doesn't generate a MAIL event.
00018965 2008-07-24 13:52:16 Matching host:service:page 'test2003:cpu:KC/KC-Test' against rule line 127 00018965 2008-07-24 13:52:16 *** Match with 'MAIL ron at tradebotsystems.com SERVICE=cpu,mem,tbntp,procs DURATION>1m REPEAT=20 COLOR=yellow,red RECOVERED TIME=W:0700:1700' *** 00018965 2008-07-24 13:52:16 Mail alert with command 'mail -s "Hobbit [12345] test2003:cpu CRITICAL (RED)" ron at tradebotsystems.com' 00018965 2008-07-24 13:52:16 Matching host:service:page 'test2003:cpu:KC/KC-Test' against rule line 128 00018965 2008-07-24 13:52:16 Failed 'MAIL ron at tradebotsystems.com SERVICE=conn,disk DURATION>1m REPEAT=20 COLOR=yellow,red RECOVERED TIME=W:0000:2359' (service not in include list)
So what's going on with SERVICE and the SCRIPT event? Any help would be appreciated.