On Fri, 2005-06-17 at 08:01 +0200, Henrik Stoerner wrote:
Something like
HOST=%(www.*).foo.com TEST=http COLOR=red COUNT>=5 MAIL someone at foo.com
The "COUNT>=5" would then cause this rule to trigger only if there were 5 or more hosts named www.*.foo.com, whose http tests are red. You could even combine this with other criteria, say have a threshold of 5 during the daytime, and 10 during off-hours.
I can foresee a problem in handling recovery-notifications for this kind of alerts, but that's something I'll have to think about.
Would that be useful ?
The main place I would use it would be NTP alerts. If one router loses NTP, I'm not terribly worried. If 10-20 of them all fail at once then I know there is something really bad happening... Maybe both GPS clocks lost sync and all 4 cesium backups failed, or ntp locked up on a core router and I need to make fewer down-stream nodes dependent on that one.
I would also consider using it for purple alerts. I don't want individual purples for most of my stuff, but if there are a lot of them (>100) then I know I killed mrtg and I should page on that.
Daniel J McDonald, CCIE # 2495, CNX Austin Energy
dan.mcdonald at austinenergy.com