Hi. A while ago, we upgraded to 4.3.15. It seems like the alert repeat setting isn't working, only the first alert is sent. We have an on-call person that receives the first alert via SMS after 7 minutes. It should then repeat every 15 minutes. The rest of the team gets their first alert after 22 minutes.
Example conf (e-mail and phone numbers masked): SCRIPT /usr/local/xymon/server/ext/html_mail.pl alarms at domain.tld EXSERVICE=conn REPEAT=1d RECOVERED FORMAT=PLAIN SCRIPT /usr/local/xymon/server/ext/html_mail.pl alarms at domain.tld SERVICE=conn DURATION>1 REPEAT=1d RECOVERED FORMAT=PLAIN SCRIPT /usr/local/bin/sendsms.sh 111111 DURATION>7 FORMAT=SMS REPEAT=15 SCRIPT /usr/local/bin/sendsms.sh 222222 DURATION>22 FORMAT=SMS REPEAT=15 SCRIPT /usr/local/bin/sendsms.sh 333333 DURATION>22 FORMAT=SMS REPEAT=15 SCRIPT /usr/local/bin/sendsms.sh 444444 DURATION>22 FORMAT=SMS REPEAT=15
From the notification log:
Mon Feb 10 05:43:15 2014 web01.apache2 (123.123.123.123) alarms at domain.tld 1392007395 0 Mon Feb 10 05:51:15 2014 web01.apache2 (123.123.123.123) 111111 1392007875 0 Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 222222 1392008717 0 Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 333333 1392008717 0 Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 444444 1392008717 0
Strangely though, it seems like it was working on Feb 5, which was also after the upgrade. The only change done since then is the patch for xymonnet, and don't see how this could affect the alerts?
Regards, Johan
Den 2014-02-10 8:18, Johan Sjöberg skrev:
A while ago, we upgraded to 4.3.15. It seems like the alert repeat setting isn't working, only the first alert is sent. We have an on-call person that receives the first alert via SMS after 7 minutes. It should then repeat every 15 minutes. The rest of the team gets their first alert after 22 minutes.
[snip config]
From the notification log:
Mon Feb 10 05:43:15 2014 web01.apache2 (123.123.123.123) alarms at domain.tld 1392007395 0
Mon Feb 10 05:51:15 2014 web01.apache2 (123.123.123.123) 111111 1392007875 0
Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 222222 1392008717 0
Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 333333 1392008717 0
Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 444444 1392008717 0
Strangely though, it seems like it was working on Feb 5, which was also after the upgrade. The only change done since then is the patch for xymonnet, and don't see how this could affect the alerts?
There are no changes to how alerts work in neither 4.3.15 or 4.3.16.
I copied your configuration into a 4.3.16 system, and REPEAT is working fine here:
$ tail -f notifications.log Mon Feb 10 09:39:58 2014 webmail.hswn.dk.conn (0.0.0.0) root[3] 1392021598 500 Mon Feb 10 09:46:16 2014 webmail.hswn.dk.conn (0.0.0.0) root-1[4] 1392021976 500 Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-1[4] 1392022917 500 Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-2[5] 1392022917 500 Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-3[6] 1392022917 500 Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-4[7] 1392022917 500 Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-1[4] 1392023826 500 Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-2[5] 1392023826 500 Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-3[6] 1392023826 500 Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-4[7] 1392023826 500
(my "root" recipient is your first recipient, the "root-X" are your "11111", "22222" etc. recipients).
You didn't list the history log for the web01.apache2 service. Are you sure that it was red all of the time? Any green status will reset the REPEAT interval, this could explain why you don't see it.
Running xymond_alert with the "--debug" option will log a lot of data about how alert messages are handled. It would be nice to have this if the problem re-occurs.
Regards, Henrik
-----Original Message----- From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of henrik at hswn.dk Sent: den 10 februari 2014 10:22 To: xymon at xymon.com Subject: Re: [Xymon] Alert REPEAT not working in 4.3.15.
Den 2014-02-10 8:18, Johan Sjöberg skrev:
A while ago, we upgraded to 4.3.15. It seems like the alert repeat setting isn't working, only the first alert is sent. We have an on-call person that receives the first alert via SMS after 7 minutes. It should then repeat every 15 minutes. The rest of the team gets their first alert after 22 minutes.
[snip config]
From the notification log:
Mon Feb 10 05:43:15 2014 web01.apache2 (123.123.123.123) alarms at domain.tld 1392007395 0
Mon Feb 10 05:51:15 2014 web01.apache2 (123.123.123.123) 111111 1392007875 0
Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 222222 1392008717 0
Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 333333 1392008717 0
Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 444444 1392008717 0
Strangely though, it seems like it was working on Feb 5, which was also after the upgrade. The only change done since then is the patch for xymonnet, and don't see how this could affect the alerts?
There are no changes to how alerts work in neither 4.3.15 or 4.3.16.
I copied your configuration into a 4.3.16 system, and REPEAT is working fine here:
$ tail -f notifications.log Mon Feb 10 09:39:58 2014 webmail.hswn.dk.conn (0.0.0.0) root[3] 1392021598 500 Mon Feb 10 09:46:16 2014 webmail.hswn.dk.conn (0.0.0.0) root-1[4] 1392021976 500 Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-1[4] 1392022917 500 Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-2[5] 1392022917 500 Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-3[6] 1392022917 500 Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-4[7] 1392022917 500 Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-1[4] 1392023826 500 Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-2[5] 1392023826 500 Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-3[6] 1392023826 500 Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-4[7] 1392023826 500
(my "root" recipient is your first recipient, the "root-X" are your "11111", "22222" etc. recipients).
You didn't list the history log for the web01.apache2 service. Are you sure that it was red all of the time? Any green status will reset the REPEAT interval, this could explain why you don't see it.
Running xymond_alert with the "--debug" option will log a lot of data about how alert messages are handled. It would be nice to have this if the problem re-occurs.
Regards, Henrik
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
If it wasn't red the whole time, the reciepients with 22 minutes delay wouldn't have received any alerts. It also happened for two different alerts during the night. I will check if I can reproduce it by forcing a red alert. Should I add the debug flag to tasks.cfg to enable it?
Regards, Johan
Den 2014-02-10 10:47, Johan Sjöberg skrev:
Running xymond_alert with the "--debug" option will log a lot of data about how alert messages are handled. It would be nice to have this if the problem re-occurs.
If it wasn't red the whole time, the reciepients with 22 minutes delay wouldn't have received any alerts. It also happened for two different alerts during the night. I will check if I can reproduce it by forcing a red alert. Should I add the debug flag to tasks.cfg to enable it?
Either that, or toggle it on-the-fly by doing "kill -USR2 pidof xymond_alert" (you can see it enabled debugging in the alert.log file).
Regards, Henrik
-----Original Message----- From: henrik at hswn.dk [mailto:henrik at hswn.dk] Sent: den 10 februari 2014 10:52 To: Johan Sjöberg Cc: xymon at xymon.com Subject: RE: [Xymon] Alert REPEAT not working in 4.3.15.
Den 2014-02-10 10:47, Johan Sjöberg skrev:
Running xymond_alert with the "--debug" option will log a lot of data about how alert messages are handled. It would be nice to have this if the problem re-occurs.
If it wasn't red the whole time, the reciepients with 22 minutes delay wouldn't have received any alerts. It also happened for two different alerts during the night. I will check if I can reproduce it by forcing a red alert. Should I add the debug flag to tasks.cfg to enable it?
Either that, or toggle it on-the-fly by doing "kill -USR2
pidof xymond_alert" (you can see it enabled debugging in the alert.log file).Regards, Henrik
Of course it worked now, so I couldn't get anything useful. Will keep an eye on this.
/Johan
participants (2)
-
henrik@hswn.dk
-
Johan.Sjoberg@deltamanagement.se