Hi,
We use the RECOVERED keyword for all recipients defined in hobbit-alerts.cfg.
We noticed a problem for hosts where alerting for a given service is excluded during a certain time. When a problem occurs on the service -out of the exclusion time-, the yellow/red alarms get sent. When the problem is resolved though, there is no recovered confirmation message/SMS. This issue is not related to the amount of time the service was down.
Example configuration and logs:
----hobbit-alerts.cfg---- ... ...
Do not send anything for given service(s) during period of time
HOST=test3 SERVICE=http TIME=*:0305:0315 ... ...
Rules by administrator
HOST=test3 MAIL test at example.com REPEAT=24h RECOVERED SCRIPT /usr/local/sendsms 0123456789 COLOR=red FORMAT=SMS REPEAT=24h RECOVERED ... ...
-----notification.log----- Mon May 22 10:23:54 2006 test3.http (13.22.8.8) test.example at com 1148286234 600 Mon May 22 10:24:34 2006 test3.http (13.22.8.8) 0123456789 1148286234 600 ... ...
------histfile for test3---------- Last 50 log entries (Full HTML log) Date Status Duration Mon May 22 10:24:15 2006 green 0:40:50 Mon May 22 10:23:54 2006 red 0:00:21
Is this a bug or a is something wrong with the exclusion specification?
Thanks
Dominique UNIL - University of Lausanne
On Mon, May 22, 2006 at 11:16:00AM +0200, Dominique Frise wrote:
We use the RECOVERED keyword for all recipients defined in hobbit-alerts.cfg.
We noticed a problem for hosts where alerting for a given service is excluded during a certain time. When a problem occurs on the service -out of the exclusion time-, the yellow/red alarms get sent. When the problem is resolved though, there is no recovered confirmation message/SMS. This issue is not related to the amount of time the service was down.
Example configuration and logs:
----hobbit-alerts.cfg---- ... ...
Do not send anything for given service(s) during period of time
HOST=test3 SERVICE=http TIME=*:0305:0315 ... ...
Rules by administrator
HOST=test3 MAIL test at example.com REPEAT=24h RECOVERED SCRIPT /usr/local/sendsms 0123456789 COLOR=red FORMAT=SMS REPEAT=24h RECOVERED
If I understand your configuration snippet correctly, then this is a configuration error. You shouldn't have rules with no recipients, like the first one you have shown here.
Is this a bug or a is something wrong with the exclusion specification?
Your exclusion is wrong. It should be (notice the TIME setting):
HOST=test3 TIME=*:0315:0305 MAIL test at example.com REPEAT=24h RECOVERED SCRIPT /usr/local/sendsms 0123456789 COLOR=red FORMAT=SMS REPEAT=24h RECOVERED
Regards, Henrik
Henrik Stoerner wrote:
On Mon, May 22, 2006 at 11:16:00AM +0200, Dominique Frise wrote:
We use the RECOVERED keyword for all recipients defined in hobbit-alerts.cfg.
We noticed a problem for hosts where alerting for a given service is excluded during a certain time. When a problem occurs on the service -out of the exclusion time-, the yellow/red alarms get sent. When the problem is resolved though, there is no recovered confirmation message/SMS. This issue is not related to the amount of time the service was down.
Example configuration and logs:
----hobbit-alerts.cfg---- ... ...
Do not send anything for given service(s) during period of time
HOST=test3 SERVICE=http TIME=*:0305:0315 ... ...
Rules by administrator
HOST=test3 MAIL test at example.com REPEAT=24h RECOVERED SCRIPT /usr/local/sendsms 0123456789 COLOR=red FORMAT=SMS REPEAT=24h RECOVERED
If I understand your configuration snippet correctly, then this is a configuration error. You shouldn't have rules with no recipients, like the first one you have shown here.
Is this a bug or a is something wrong with the exclusion specification?
Your exclusion is wrong. It should be (notice the TIME setting):
HOST=test3 TIME=*:0315:0305 MAIL test at example.com REPEAT=24h RECOVERED SCRIPT /usr/local/sendsms 0123456789 COLOR=red FORMAT=SMS REPEAT=24h RECOVERED
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Thank you fo these explanations.
That means it is not possible to write simple rules for excluding alerts for a given service for all hosts (HOST=*) during a period of time? Do we really have to write the same exclude/include rules for each host?
Dominique UNIL - University of Lausanne
On Tue, May 30, 2006 at 04:32:30PM +0200, Dominique Frise wrote:
Your exclusion is wrong. It should be (notice the TIME setting):
HOST=test3 TIME=*:0315:0305 MAIL test at example.com REPEAT=24h RECOVERED SCRIPT /usr/local/sendsms 0123456789 COLOR=red FORMAT=SMS REPEAT=24h RECOVERED
Thank you fo these explanations.
That means it is not possible to write simple rules for excluding alerts for a given service for all hosts (HOST=*) during a period of time? Do we really have to write the same exclude/include rules for each host?
Not at all. If that's what you want to do, put this at the top of your rules list:
TIME=*:0305:0315 IGNORE
Regards, Henrik
Henrik Stoerner wrote:
On Tue, May 30, 2006 at 04:32:30PM +0200, Dominique Frise wrote:
Your exclusion is wrong. It should be (notice the TIME setting):
HOST=test3 TIME=*:0315:0305 MAIL test at example.com REPEAT=24h RECOVERED SCRIPT /usr/local/sendsms 0123456789 COLOR=red FORMAT=SMS REPEAT=24h RECOVERED
Thank you fo these explanations.
That means it is not possible to write simple rules for excluding alerts for a given service for all hosts (HOST=*) during a period of time? Do we really have to write the same exclude/include rules for each host?
Not at all. If that's what you want to do, put this at the top of your rules list:
TIME=*:0305:0315 IGNORE
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Thanks again for this other tip.
I think I did not fully understand the IGNORE setting :-)
All our rules are now setup -hopefully- correctly and recovered messages are sent when they should :-)
Dominique UNIL - University of Lausanne
participants (2)
-
Dominique.Frise@unil.ch
-
henrik@hswn.dk