On Sat, Mar 19, 2005 at 10:33:09AM -0600, Daniel J McDonald wrote:
I'm still flummoxed by hobbit-alerts. I'm certain I broke something, because I am not getting any alerts from the box.
It's probably a config error ...
The only logs in /var/log/hobbit/page.log are 2005-03-11 07:49:30 Tried to down BOARDBUSY: Invalid argument 2005-03-14 17:24:21 Tried to down BOARDBUSY: Invalid argument
These are harmless, and often occur when Hobbit is shutdown or restarted.
I see a couple of those in the hobbitlaunch.log file as well, I also see the following error: 2005-03-19 10:14:21 Task bbdisplay started with PID 7417 2005-03-19 10:14:21 Task bbretest started with PID 7418 2005-03-19 10:14:29 Our child has failed and will not talk to us 2005-03-19 10:14:36 Our child has failed and will not talk to us
That's a first - and you're right it should be more detailed in the error-message. I've fixed that. But it generally means that one of the hobbitd helper tasks has stopped responding.
Here is a sample host that is not paging. The info page lists: Service Recipient 1st Delay Stop after Repeat Time of Day Colors conn dan.mcdonald at austinenergy.com (R) 30m - 5d - red telnet dan.mcdonald at austinenergy.com (R) 30m - 5d - red
Both telnet and conn have been down on this host for over two hours.
The salient rule is: HOST=%. MAIL=dan.mcdonald at austinenergy.com REPEAT=140h DURATION>30m RECOVERED COLOR="red" UNMATCHED
Your "HOST=" is wrong - it will only match hostnames with exactly one letter (do you really have a host named "a" ?) - if you want to match all hosts, then it's "HOST=%.*" or the simple form "HOST=*"
So some other rule must be generating the info-column output you have, and therefore even if your HOST entry was correct, the rule would not trigger because of the UNMATCHED restriction.
Could you try running
exec ~hobbit/server/bin/bbcmd hobbitd_alert --test HOSTNAME conn "" 120 red
That should tell you how the alert is handled, and who gets notified using what rules.
Regards, Henrik