Hi all
I'm playing with alerts and Hobbit 4.0-rc2, and I must say the ease of use is fantastic ! The only problem is that I can't see my alerts in the "info" colum.
My $BBHOME/etc/hobbit-alerts.cfg contains this :
HOST=foo TIME=W:0900:1800 SCRIPT /tmp/alerte.sh SERVICE=* EXSERVICE=disk,mem,procs SCRIPT /tmp/alerte.sh SERVICE=disk DURATION>5m REPEAT=2h SCRIPT /tmp/alerte.sh SERVICE=mem COLOR=yellow REPEAT=24h SCRIPT /tmp/alerte.sh SERVICE=procs TIME=*:1145:1150 REPEAT=24h
Alerts work fine, but I have this in the "info" column :
"No e-mail/SMS alerting defined"
Any hint ?
Thanks in advance.
Regards,
--
Frédéric Mangeant
On Tue, Feb 15, 2005 at 11:57:03AM +0100, Frédéric Mangeant wrote:
Hi all
I'm playing with alerts and Hobbit 4.0-rc2, and I must say the ease of use is fantastic !
Thanks :-)
The only problem is that I can't see my alerts in the "info" colum.
Known mis-feature. The "info" generator cannot handle the Hobbit alert configuration right now.
Regards, Henrik
The only problem is that I can't see my alerts in the "info" colum.
Known mis-feature. The "info" generator cannot handle the Hobbit alert configuration right now.
Thanks for your answer. I've changed the subjet of this mail because I think some alerts don't work as expected.
With this $BBHOME/etc/hobbit-alerts.cfg :
HOST=foo TIME=W:0900:1800 SCRIPT /tmp/alerte.sh SERVICE=* EXSERVICE=disk,mem,procs SCRIPT /tmp/alerte.sh SERVICE=disk DURATION>5m REPEAT=2h SCRIPT /tmp/alerte.sh SERVICE=mem COLOR=yellow REPEAT=24h SCRIPT /tmp/alerte.sh SERVICE=procs TIME=*:1145:1150,*:1205:1300 REPEAT=24h
I received these alerts :
15/02/2005 11:38:33 foo.mem = yellow (ACK :125406) 15/02/2005 11:39:33 foo.disk = red (ACK :419182) 15/02/2005 11:45:13 foo.procs = red (ACK :992240) 15/02/2005 12:03:46 foo.procs = red (ACK :143469) 15/02/2005 13:39:50 foo.disk = red (ACK :78043) 15/02/2005 14:03:50 foo.procs = red (ACK :408423) 15/02/2005 14:09:50 foo.cpu = yellow (ACK :844373) 15/02/2005 14:09:50 foo.cpu = yellow (ACK :844373) 15/02/2005 14:11:50 foo.cpu = yellow (ACK :589672) 15/02/2005 14:11:50 foo.cpu = yellow (ACK :589672)
I think there are 2 false errors :
- for each 'foo.cpu' alert I got paged twice, with the same ACK code.
- I shouldn't have been paged between 11h05 and 12h05, nor after 13h00, for 'foo.procs'
Any clue ?
Thanks...
--
Frédéric Mangeant
In <UQX708PZ3AFGXC02XC at mailvel.velizy.fr.sterianet> Fr�d�ric Mangeant <frederic.mangeant at steria.com> writes:
some alerts don't work as expected.
[snip config and summary of sent alerts]
I think there are 2 false errors :
- for each 'foo.cpu' alert I got paged twice, with the same ACK code.
- I shouldn't have been paged between 11h05 and 12h05, nor after 13h00, for 'foo.procs'
Could you try running "bbcmd hobbitd_alert --test foo cpu" ?
Also, if you add the option "--cfid" to the hobbitd_alert commandline in hobbitlaunch.cfg, it will include the linenumber of the hobbit-alerts.cfg file with each alert. That should make it easier to track down what rules trigger an alert.
Regards, Henrik
In <cusuvl$s7c$1 at voodoo.hswn.dk> Henrik Storner <henrik at hswn.dk> writes:
Also, if you add the option "--cfid" to the hobbitd_alert commandline in hobbitlaunch.cfg, it will include the linenumber of the hobbit-alerts.cfg file with each alert. That should make it easier to track down what rules trigger an alert.
I just noticed this won't work for SCRIPT recipients, because it's put in the message subject which scripts ignore. So drop that.
Henrik
I think there are 2 false errors :
- for each 'foo.cpu' alert I got paged twice, with the same ACK code.
- I shouldn't have been paged between 11h05 and 12h05, nor after 13h00, for 'foo.procs'
Could you try running "bbcmd hobbitd_alert --test foo cpu" ?
Of course :
$ $BBHOME/bin/bbcmd hobbitd_alert --test foo cpu 2005-02-15 14:59:22 Using default environment file ../etc/hobbitserver.cfg Matching host:service:page 'foo:cpu:' against rule line 115:Matched *** Match with 'HOST=foo TIME=W:0900:1800' *** Matching host:service:page 'foo:cpu:' against rule line 116:Matched *** Match with 'SCRIPT /tmp/alerte.sh SERVICE=* EXSERVICE=disk,mem,procs' *** Script alert with command '/tmp/alerte.sh' and recipient SERVICE=* Matching host:service:page 'foo:cpu:' against rule line 117:Failed (min. duration) Matching host:service:page 'foo:cpu:' against rule line 118:Failed (color) Matching host:service:page 'foo:cpu:' against rule line 119:Failed (time criteria)
Here are lines 115 to 119 of my $BBHOME/etc/hobbit-alerts.cfg :
115 HOST=foo TIME=W:0900:1800 116 SCRIPT /tmp/alerte.sh SERVICE=* EXSERVICE=disk,mem,procs 117 SCRIPT /tmp/alerte.sh SERVICE=disk DURATION>5m REPEAT=2h 118 SCRIPT /tmp/alerte.sh SERVICE=mem COLOR=yellow REPEAT=24h 119 SCRIPT /tmp/alerte.sh SERVICE=procs TIME=*:1145:1150,*:1205:1300 REPEAT=24h
Also, if you add the option "--cfid" to the hobbitd_alert commandline in hobbitlaunch.cfg, it will include the linenumber of the hobbit-alerts.cfg file with each alert. That should make it easier to track down what rules trigger an alert.
Done.
I just noticed this won't work for SCRIPT recipients, because it's put in the message subject which scripts ignore. So drop that.
Undone ;-)
Regards,
--
Frédéric Mangeant
On Tue, Feb 15, 2005 at 02:36:26PM +0100, Frédéric Mangeant wrote:
I think there are 2 false errors :
- for each 'foo.cpu' alert I got paged twice, with the same ACK code.
- I shouldn't have been paged between 11h05 and 12h05, nor after 13h00, for 'foo.procs'
I've tried, but I cannot make this happen on my own setup.
Could you send me the script you use for alerting, and the ~hobbit/data/ack/notifications.log file ?
Regards, Henrik
Hi Henrik
I've tried, but I cannot make this happen on my own setup.
Could you send me the script you use for alerting, and the ~hobbit/data/ack/notifications.log file ?
Well, I moved to another server, on which I cleanly installed Hobbit 4.0-rc2
- patches, and can't seem to reproduice the problem.
Anyway, here's my tiny paging script :
$ cat /tmp/alert.sh #!/bin/sh
DATE=date +%d/%m/%Y%t%H:%M:%S
echo "$DATE $BBHOSTNAME.$BBSVCNAME = $BBCOLORLEVEL (ack : $ACKCODE,
recovered : $RECOVERED)" >> /tmp/alert.txt
I did some more testing, there seems to be 2 small problems :
- Warning when the format of a script is missing
With this rule :
$ cat $BBHOME/etc/hobbit-alerts.cfg HOST=fmangeant SERVICE=* EXSERVICE=procs,disk,mem,svcs REPEAT=24h TIME=W:0900:1800 SCRIPT /tmp/alert.sh FORMAT=TEXT HOST=fmangeant SERVICE=disk DURATION>2m SCRIPT /tmp/alert.sh
I get a warning :
$ $BBHOME/bin/bbcmd hobbitd_alert --test fmangeant disk 2005-02-16 15:22:03 Using default environment file /BB/hobbit/server/etc/hobbitserver.cfg 2005-02-16 15:22:03 Ignoring SCRIPT with no recipient at line 2 Matching host:service:page 'fmangeant:disk:' against rule line 1:Failed (service excluded) Matching host:service:page 'fmangeant:disk:' against rule line 2:Failed (min. duration)
If I add the format of the script, like this :
$ cat $BBHOME/etc/hobbit-alerts.cfg HOST=fmangeant SERVICE=* EXSERVICE=procs,disk,mem,svcs REPEAT=24h TIME=W:0900:1800 SCRIPT /tmp/alert.sh FORMAT=TEXT HOST=fmangeant SERVICE=disk DURATION>2m SCRIPT /tmp/alert.sh FORMAT=text
$ $BBHOME/bin/bbcmd hobbitd_alert --test fmangeant disk 2005-02-16 15:22:54 Using default environment file /BB/hobbit/server/etc/hobbitserver.cfg Matching host:service:page 'fmangeant:disk:' against rule line 1:Failed (service excluded) Matching host:service:page 'fmangeant:disk:' against rule line 2:Failed (min. duration)
- Repeat interval not correctly taken into account
I tried to repeat an alert every 5 minutes :
$ cat $BBHOME/etc/hobbit-alerts.cfg HOST=fmangeant SERVICE=* EXSERVICE=procs,disk,mem,svcs REPEAT=24h TIME=W:0900:1800 SCRIPT /tmp/alert.sh FORMAT=TEXT HOST=fmangeant SERVICE=disk DURATION>2m SCRIPT /tmp/alert.sh FORMAT=TEXT HOST=fmangeant SERVICE=procs REPEAT=5m SCRIPT /tmp/alert.sh FORMAT=TEXT
$ $BBHOME/bin/bbcmd hobbitd_alert --test fmangeant procs 2005-02-16 15:23:59 Using default environment file /BB/hobbit/server/etc/hobbitserver.cfg Matching host:service:page 'fmangeant:procs:' against rule line 1:Failed (service excluded) Matching host:service:page 'fmangeant:procs:' against rule line 2:Failed (min. duration) Matching host:service:page 'fmangeant:procs:' against rule line 3:Matched *** Match with 'HOST=fmangeant SERVICE=procs REPEAT=5m SCRIPT /tmp/alert.sh FORMAT=TEXT' *** Script alert with command '/tmp/alert.sh' and recipient FORMAT=TEXT
But I got paged every 30 minutes :
$ cat /tmp/alert.txt 16/02/2005 14:43:27 fmangeant.procs = red (ack : 145155, recovered : 0) 16/02/2005 15:13:30 fmangeant.procs = red (ack : 145155, recovered : 0)
Is it possible to use any repeat value ?
Thanks in advance.
Regards,
--
Frédéric Mangeant
participants (2)
-
frederic.mangeant@steria.com
-
henrik@hswn.dk