We have xymon set to page four tiers of support after appropriate delays, like so: MAIL alert1 REPEAT=10 COLOR=red,purple FORMAT=SMS # MAIL alert2 DURATION>20 REPEAT=10 COLOR=red,purple FORMAT=SMS# MAIL alert3 DURATION>40 REPEAT=10 COLOR=red,purple FORMAT=SMS# MAIL alert4 DURATION>60 REPEAT=10 COLOR=red,purple FORMAT=SMS#
HOWEVER there are two circumstances in which xymon pages ALL the tiers IMMEDIATELY:
- when a yellow has been yellow for some time, and then turns red
- when a server has gone red during a scheduled maintenance window
IMHO this is undesirable behavior. Having a server go from yellow to red, or having a server come out of a maintenance window, should 'reset' the alert timer to zero and only page the first level of alert. We don't ever want all of the tiers to be paged the second something turns red.
Thoughts? Any way around this?
This is seriously ticking off our tier 4 person...who has been paged multiple times now for things that the tier1 folks are prepared to handle in a timely manner.