Hello,
For most of our servers, we have a MAIL statement that sends e-mails out 24x7 for failures. But then we have another MAIL statement in the same rule that e-mails another group, and that should only happen 7AM to 5PM.
My question is, if the failure happens at 3AM, e-mail #1 goes out. If it is still down at 7AM, shouldn't e-mail #2 go out?
It appears right now that only failures that start during 7-5 will send e-mail. Anything that happens overnight, even if it is still failed at 7AM, it will not e-mail.
Could this be because I've increased my ALERTREPEAT="1440" in hobbitserver.cfg?
PAGE=vpn MAIL me at domain.com MAIL another at domain.com DURATION>5m TIME=*:0700:1700 SCRIPT qpage.sh qp-me FORMAT=SMS DURATION>30 TIME=*:0700:1700
Thanks, Shane
On Wed, Nov 01, 2006 at 07:43:33AM -0500, Shane Presley wrote:
For most of our servers, we have a MAIL statement that sends e-mails out 24x7 for failures. But then we have another MAIL statement in the same rule that e-mails another group, and that should only happen 7AM to 5PM.
My question is, if the failure happens at 3AM, e-mail #1 goes out. If it is still down at 7AM, shouldn't e-mail #2 go out?
Could this be because I've increased my ALERTREPEAT="1440" in hobbitserver.cfg?
I think it could be related. I haven't found out exactly what happens, but it looks like the second e-mail is not being considered until the repeat-interval for the first e-mail triggers. In other words, after sending email #1, the rules for email #2 are ignored for 1440 minutes.
Which - obviously - is not what should happen.
Could you try if swapping those two MAIL lines changes anything ? If my suspicion is right, then it should work if you have the time-restricted MAIL line first.
Regards, Henrik
On 11/1/06, Henrik Stoerner <henrik at hswn.dk> wrote:
I think it could be related. I haven't found out exactly what happens, but it looks like the second e-mail is not being considered until the repeat-interval for the first e-mail triggers. In other words, after sending email #1, the rules for email #2 are ignored for 1440 minutes.
Which - obviously - is not what should happen.
Could you try if swapping those two MAIL lines changes anything ? If my suspicion is right, then it should work if you have the time-restricted MAIL line first.
I spent awhile troubleshooting this today, and found that it does appear to be related to the REPEAT options, but I can't find a way to make it work.
-Changing the order of the lines (placing the time-restricted line first) didn't help. It would still e-mail me at the first instance of the failure, but then not move on to the time-restricted statement when that time hit.
-I tried lowering the ALERTREPEAT value in hobbitserver.cfg, restart hobbit, then use REPEAT= statements individually for each MAIL statement, and it still behaved the same. It would notify me, but fail to send a notification once the time statement was matched.
-I tried removing the local REPEAT= statements, and kept a low ALERTREPEAT in hobbitserver.cfg, and of course that worked fine. It would e-mail me (and repeat as needed) and would also e-mail the second user once the time statement was matched.
-I then even tried to be tricky, creating two individual rules... PAGE=vpn MAIL me at domain.com REPEAT=1400 PAGE=vpn MAIL another at domain.com TIME=*:0700:1700 REPEAT=1400
And that still failed. It would notify me at domain.com, but never trigger the another at domain.com rule, even once the time hit.
So it appears I can't find a way to make escalation work, when I have attempted to turn off the repeat function?
Unless I'm doing something wrong...
Thanks, Shane
participants (2)
-
henrik@hswn.dk
-
shane.presley@gmail.com