On Wed, October 7, 2015 11:58 pm, Gavin Stone-Tolcher wrote:
Hi, We are seeing unusual alerting behaviour with Xymon 4.3.21 server using a "holidays.cfg" with HOLIDAYLIKEWEEKDAY=0.
We have a network operations team (uqnoc-sms) that gets alerts during business hours (TIME=W:0800:1700) And a data networks team (dn-sms) that get out of business hours alerts in certain windows (TIME=W:0600:0759,W:1701:2200,60:0600:2200)
Rules are like:
PAGE=$UNSMSREGEX EXHOST=$UNEXCLUDE MAIL uqnoc-sms at xx.yy.edu.au SERVICE=$UNSMSSVCS DURATION>6m TIME=W:0800:1700 COLOR=red REPEAT=1w FORMAT=SMS RECOVERED MAIL dn-sms at xx.yy.edu.au SERVICE=$UNSMSSVCS DURATION>6m TIME=W:0600:0759,W:1701:2200,60:0600:2200 COLOR=red REPEAT=1w FORMAT=SMS RECOVERED
For a "red" conn test covered by the rule on a weekday public holiday, it seems to correctly identify not to send an alert to "uqnoc-sms" (TIME=W:0800:1700 ) and instead correctly generates an alert to "dn-sms" (TIME=60:0600:2200 component), but then keeps sending the same alert approximately every minute (my xymonnet poll cycle). Ignores REPEAT=1w?
Before I try and debug much further, I thought I would ask if anyone else has seen similar behaviour?
Hmm. Does the REPEAT value work with a smaller interval (such as 1d or 1h)? And what type of system are you running on?
I'm curious if there's a REPEAT over/underflow going on instead of something specific to the TIME exclusion back and forth.
Is the test persistently red with no spurious recoveries being generated during the period in question?
-jc