I think there are 2 false errors :
- for each 'foo.cpu' alert I got paged twice, with the same ACK code.
- I shouldn't have been paged between 11h05 and 12h05, nor after 13h00, for 'foo.procs'
Could you try running "bbcmd hobbitd_alert --test foo cpu" ?
Of course :
$ $BBHOME/bin/bbcmd hobbitd_alert --test foo cpu 2005-02-15 14:59:22 Using default environment file ../etc/hobbitserver.cfg Matching host:service:page 'foo:cpu:' against rule line 115:Matched *** Match with 'HOST=foo TIME=W:0900:1800' *** Matching host:service:page 'foo:cpu:' against rule line 116:Matched *** Match with 'SCRIPT /tmp/alerte.sh SERVICE=* EXSERVICE=disk,mem,procs' *** Script alert with command '/tmp/alerte.sh' and recipient SERVICE=* Matching host:service:page 'foo:cpu:' against rule line 117:Failed (min. duration) Matching host:service:page 'foo:cpu:' against rule line 118:Failed (color) Matching host:service:page 'foo:cpu:' against rule line 119:Failed (time criteria)
Here are lines 115 to 119 of my $BBHOME/etc/hobbit-alerts.cfg :
115 HOST=foo TIME=W:0900:1800 116 SCRIPT /tmp/alerte.sh SERVICE=* EXSERVICE=disk,mem,procs 117 SCRIPT /tmp/alerte.sh SERVICE=disk DURATION>5m REPEAT=2h 118 SCRIPT /tmp/alerte.sh SERVICE=mem COLOR=yellow REPEAT=24h 119 SCRIPT /tmp/alerte.sh SERVICE=procs TIME=*:1145:1150,*:1205:1300 REPEAT=24h
Also, if you add the option "--cfid" to the hobbitd_alert commandline in hobbitlaunch.cfg, it will include the linenumber of the hobbit-alerts.cfg file with each alert. That should make it easier to track down what rules trigger an alert.
Done.
I just noticed this won't work for SCRIPT recipients, because it's put in the message subject which scripts ignore. So drop that.
Undone ;-)
Regards,
--
Frédéric Mangeant