TIME alert problems (still)
Fellow Hobbitiers...
Some may remember an issue I raised a few weeks back with regard to the TIME option in the hobbit-alerts config in that TIME was not being honoured and so we were getting alerted during blackout time periods that were set. This still looks like it's an issue in the snapshot I downloaded 3days after the beta release.
Has anyone got any other ideas, is TIME only honoured on built in checks or something along those lines as it's the ext scripts that are causing the alerting (they are sending results to hobbit and hobbit does the actual alerting).
Some information...
notifications.log
Sun Jun 18 21:19:37 2006 xxxxx.aq (10.6.2.2) support [139] 1150661977 0
Sun Jun 18 21:24:21 2006 xxxxx.aq (10.7.2.2) sysalert [137] 1150662261 0
Sun Jun 18 21:24:21 2006 xxxxx.aq (10.7.2.2) support [139] 1150662261 0
Sun Jun 18 21:25:21 2006 xxxxx.aq (10.8.2.2) sysalert [137] 1150662321 0
Sun Jun 18 21:25:21 2006 xxxxx.aq (10.8.2.2) support [139] 1150662321 0
Sun Jun 18 22:20:01 2006 xxxxx.aq (10.6.2.2) sysalert [137] 1150665601 0
Sun Jun 18 22:20:01 2006 xxxxx.aq (10.6.2.2) support [139] 1150665601 0
hobbit-alerts.cfg
HOST=*
MAIL=sysalert SERVICE=aq FORMAT=PLAIN REPEAT=1h COLOR=yellow
MAIL=support SERVICE=aq COLOR=RED FORMAT=SMS DURATION>5 REPEAT=1h
TIME=W:0900:1700 STOP
(these are lines 135 and 136 so it looks like it's ignoring them totally, although in bb-hostsvc.sh it shows them laid out properly with the correct blackout times listed against the services). As you can see from the information above even though the aq service is set to only alert W(eekdays) between 0900 and 1700 we were still getting alerts over the weekend.
I also have the same problem with another service, this one was just easiest to get the information for.
Regards,
Mike Rowell
This email has been scanned for all viruses by the MessageLabs service.
On Mon, Jun 19, 2006 at 09:50:37AM +0100, Mike Rowell wrote:
Sun Jun 18 21:19:37 2006 xxxxx.aq (10.6.2.2) support [139] 1150661977 0 Sun Jun 18 21:24:21 2006 xxxxx.aq (10.7.2.2) sysalert [137] 1150662261 0 Sun Jun 18 21:24:21 2006 xxxxx.aq (10.7.2.2) support [139] 1150662261 0 Sun Jun 18 21:25:21 2006 xxxxx.aq (10.8.2.2) sysalert [137] 1150662321 0 Sun Jun 18 21:25:21 2006 xxxxx.aq (10.8.2.2) support [139] 1150662321 0 Sun Jun 18 22:20:01 2006 xxxxx.aq (10.6.2.2) sysalert [137] 1150665601 0 Sun Jun 18 22:20:01 2006 xxxxx.aq (10.6.2.2) support [139] 1150665601 0
HOST=* MAIL=sysalert SERVICE=aq FORMAT=PLAIN REPEAT=1h COLOR=yellow MAIL=support SERVICE=aq COLOR=RED FORMAT=SMS DURATION>5 REPEAT=1h TIME=W:0900:1700 STOP
(these are lines 135 and 136 so it looks like it's ignoring them totally, although in bb-hostsvc.sh it shows them laid out properly with the correct blackout times listed against the services).
What's on lines 137 and 139 of the hobbit-alerts.cfg file ? Those are the lines that trigger these alerts, as evidenced by the "[13x]" in the log entries.
Regards, Henrik
Henrik,
On 137 and 139 we have the catch alls for sysalert and support (support is our red address and sysalert is where we send both to).
Mike
-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: 19 June 2006 10:33 To: hobbit at hswn.dk Subject: Re: [hobbit] TIME alert problems (still)
On Mon, Jun 19, 2006 at 09:50:37AM +0100, Mike Rowell wrote:
Sun Jun 18 21:19:37 2006 xxxxx.aq (10.6.2.2) support [139] 1150661977 0 Sun Jun 18 21:24:21 2006 xxxxx.aq (10.7.2.2) sysalert [137] 1150662261 0 Sun Jun 18 21:24:21 2006 xxxxx.aq (10.7.2.2) support [139] 1150662261 0 Sun Jun 18 21:25:21 2006 xxxxx.aq (10.8.2.2) sysalert [137] 1150662321 0 Sun Jun 18 21:25:21 2006 xxxxx.aq (10.8.2.2) support [139] 1150662321 0 Sun Jun 18 22:20:01 2006 xxxxx.aq (10.6.2.2) sysalert [137] 1150665601 0 Sun Jun 18 22:20:01 2006 xxxxx.aq (10.6.2.2) support [139] 1150665601 0
HOST=* MAIL=sysalert SERVICE=aq FORMAT=PLAIN REPEAT=1h COLOR=yellow MAIL=support SERVICE=aq COLOR=RED FORMAT=SMS DURATION>5 REPEAT=1h TIME=W:0900:1700 STOP
(these are lines 135 and 136 so it looks like it's ignoring them totally, although in bb-hostsvc.sh it shows them laid out properly with the correct blackout times listed against the services).
What's on lines 137 and 139 of the hobbit-alerts.cfg file ? Those are the lines that trigger these alerts, as evidenced by the "[13x]" in the log entries.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
This email has been scanned for all viruses by the MessageLabs service.
This email has been scanned for all viruses by the MessageLabs service.
On Mon, Jun 19, 2006 at 10:38:17AM +0100, Mike Rowell wrote:
Henrik,
On 137 and 139 we have the catch alls for sysalert and support (support is our red address and sysalert is where we send both to).
Well, those catch-all rules are what triggers the alerts you don't want. They probably have a "UNMATCHED" setting ? But that will also cause them to be applied when the rules above them are skipped due to time- constraints.
In other words, if you have a setup like
HOST=myhost TEST=mytest MAIL dayshift at foo.com TIME=W:0800:1700
HOST=* MAIL support at foo.com UNMATCHED
then "support at foo.com" will get all myhost.mytest alerts that happen outside the weekdays-0800-1700 time window.
Regards, Henrik
Henrik,
So what you're saying is that when you have a TIME blackout window for a service, even if the last rule for that service has STOP after it, the alerts continue until it finds a rule it can send with?
That if it is what you are saying is not something I would be expecting. Just so you can see, these are the two lines 137 and 139.
MAIL=systems at some.domain COLOR=red,yellow REPEAT=1h FORMAT=PLAIN MAIL=support-rightmove at some.domain COLOR=RED FORMAT=SMS DURATION>5 REPEAT=1h
Regards,
Mike
-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: 19 June 2006 11:43 To: hobbit at hswn.dk Subject: Re: [hobbit] TIME alert problems (still)
On Mon, Jun 19, 2006 at 10:38:17AM +0100, Mike Rowell wrote:
Henrik,
On 137 and 139 we have the catch alls for sysalert and support (support is our red address and sysalert is where we send both to).
Well, those catch-all rules are what triggers the alerts you don't want. They probably have a "UNMATCHED" setting ? But that will also cause them to be applied when the rules above them are skipped due to time- constraints.
In other words, if you have a setup like
HOST=myhost TEST=mytest MAIL dayshift at foo.com TIME=W:0800:1700
HOST=* MAIL support at foo.com UNMATCHED
then "support at foo.com" will get all myhost.mytest alerts that happen outside the weekdays-0800-1700 time window.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
This email has been scanned for all viruses by the MessageLabs service.
This email has been scanned for all viruses by the MessageLabs service.
Hi Mike,
On Mon, Jun 19, 2006 at 11:55:53AM +0100, Mike Rowell wrote:
So what you're saying is that when you have a TIME blackout window for a service, even if the last rule for that service has STOP after it, the alerts continue until it finds a rule it can send with?
Yes.
That if it is what you are saying is not something I would be expecting.
OK, let me try and explain why that is. From your other email I gather your alert configuration (lines 134-139) is like this:
HOST=* MAIL=sysalert SERVICE=aq FORMAT=PLAIN REPEAT=1h COLOR=yellow MAIL=support SERVICE=aq COLOR=RED FORMAT=SMS DURATION>5 REPEAT=1h TIME=W:0900:1700 STOP MAIL=systems at some.domain COLOR=red,yellow REPEAT=1h FORMAT=PLAIN MAIL=support-rightmove at some.domain COLOR=RED FORMAT=SMS DURATION>5 REPEAT=1h
The STOP keyword means (from the man-page): "STOP Stop looking for more recipients after this one matches." So STOP only applies for rules that are positively matched (ie. they did result in an alert being sent).
If STOP meant "after seeing this rule, whether it matched or not, stop looking for any more recipients" - then your two last lines (the "catch-all" rules) would never trigger because there's a STOP rule in front of them. And that is not what you would expect either.
I *think* that what you want is to have "sysalert" and "support" alerted on weekdays, and the "systems at ..." and "support-rightmove at ..." alerted outside this time window. May I suggest
TIME=W:0900:1700 SERVICE=aq MAIL=sysalert COLOR=yellow FORMAT=PLAIN REPEAT=1h MAIL=support COLOR=red FORMAT=SMS DURATION>5 REPEAT=1h
EXTIME=W:0900:1700 MAIL=systems at some.domain COLOR=red,yellow REPEAT=1h FORMAT=PLAIN MAIL=support-rightmove at some.domain COLOR=red FORMAT=SMS DURATION>5 REPEAT=1h
Regards, Henrik
Thanks for this information Henrik,
One small problem, I'm running the 4.2 beta snapshot from a few days after release, I'm getting this in the log files.
2006-06-19 14:56:58 Ignored unknown/unexpected token 'EXTIME=W:0900:1700' at line 131 2006-06-19 14:56:58 Ignored unknown/unexpected token 'EXTIME=*:0200:0700' at line 137
Can you let us know if it's the current snapshot we need to run to use this feature?
Regards,
Mike
-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: 19 June 2006 13:47 To: hobbit at hswn.dk Subject: Re: [hobbit] TIME alert problems (still)
Hi Mike,
On Mon, Jun 19, 2006 at 11:55:53AM +0100, Mike Rowell wrote:
So what you're saying is that when you have a TIME blackout window for a service, even if the last rule for that service has STOP after it, the alerts continue until it finds a rule it can send with?
Yes.
That if it is what you are saying is not something I would be expecting.
OK, let me try and explain why that is. From your other email I gather your alert configuration (lines 134-139) is like this:
HOST=* MAIL=sysalert SERVICE=aq FORMAT=PLAIN REPEAT=1h COLOR=yellow MAIL=support SERVICE=aq COLOR=RED FORMAT=SMS DURATION>5 REPEAT=1h TIME=W:0900:1700 STOP MAIL=systems at rightmove.co.uk COLOR=red,yellow REPEAT=1h FORMAT=PLAIN MAIL=support-rightmove at mail.sms2email.com COLOR=RED FORMAT=SMS DURATION>5 REPEAT=1h
The STOP keyword means (from the man-page): "STOP Stop looking for more recipients after this one matches." So STOP only applies for rules that are positively matched (ie. they did result in an alert being sent).
If STOP meant "after seeing this rule, whether it matched or not, stop looking for any more recipients" - then your two last lines (the "catch-all" rules) would never trigger because there's a STOP rule in front of them. And that is not what you would expect either.
I *think* that what you want is to have "sysalert" and "support" alerted
on weekdays, and the "systems at ..." and "support-rightmove at ..." alerted outside this time window. May I suggest
TIME=W:0900:1700 SERVICE=aq MAIL=sysalert COLOR=yellow FORMAT=PLAIN REPEAT=1h MAIL=support COLOR=red FORMAT=SMS DURATION>5 REPEAT=1h
EXTIME=W:0900:1700 MAIL=systems at rightmove.co.uk COLOR=red,yellow REPEAT=1h FORMAT=PLAIN MAIL=support-rightmove at mail.sms2email.com COLOR=red FORMAT=SMS DURATION>5 REPEAT=1h
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
This email has been scanned for all viruses by the MessageLabs service.
This email has been scanned for all viruses by the MessageLabs service.
On Mon, Jun 19, 2006 at 03:58:31PM +0100, Mike Rowell wrote:
Thanks for this information Henrik,
One small problem, I'm running the 4.2 beta snapshot from a few days after release, I'm getting this in the log files.
2006-06-19 14:56:58 Ignored unknown/unexpected token 'EXTIME=W:0900:1700' at line 131 2006-06-19 14:56:58 Ignored unknown/unexpected token 'EXTIME=*:0200:0700' at line 137
Can you let us know if it's the current snapshot we need to run to use this feature?
Oops - sorry. Dont have en "EXTIME" keyword, since it's simple to do with just TIME:
EXTIME=W:0900:1700
should be TIME=W:1700:0900,06:0000:2359
Which tells me that EXTIME is more readable, so perhaps I should go and create that one...
Regards, Henrik
participants (2)
-
henrik@hswn.dk
-
Mike.Rowell@Rightmove.co.uk