On Thu, Jun 16, 2005 at 02:28:53PM -0700, Bruce Lysik wrote:
My best suggestion would be to use the bbcombotest tool to define a pseudo "host" with the combined status of your host "pool".
E.g. if you're monitoring http on 5 hosts, you could define a combination test like this:
Pool1.http=(hostA.http+hostB.http+hostC.http+hostD.http+hostE.http)>3
That would give you a red alert if 3 or fewer hosts in the pool were green. And you could then trigger an alert based on that test result.
Pretty unwieldy when you have large pools of servers, however.
Could be, yes.
I just started writing a smart paging script which will keep track of downed hosts and decide whether or not to page.
I'm interested to know if this kind of alerting is generally useful. I suspect it might be ... if so, then we should devise a way of defining such alerts directly in Hobbit instead of forcing you to come up with scripts that work around this.
Perhaps one solution could be to implement a new kind of rule for the hobbit-alerts file. Currently all of the rules are matched against a specific host+test combination; we could define a type of rule that could be matched against all of the host+test statuses that are in an alerting stage, and then have the rule trigger based on some criteria for how many matches we get.
Something like
HOST=%(www.*).foo.com TEST=http COLOR=red COUNT>=5 MAIL someone at foo.com
The "COUNT>=5" would then cause this rule to trigger only if there were 5 or more hosts named www.*.foo.com, whose http tests are red. You could even combine this with other criteria, say have a threshold of 5 during the daytime, and 10 during off-hours.
I can foresee a problem in handling recovery-notifications for this kind of alerts, but that's something I'll have to think about.
Would that be useful ?
One question I have so far is: Does hobbit wait for an alerting script to return before continuing to evaluate other rules?
Paging scripts are serialized, yes - Hobbit will wait for a paging script to complete before continuing down the list of alert rules.
Regards, Henrik