I tried a couple of these, and it says it's sending mail to me, but there is nothing in the log...
Ah wait, here's something in the log: postfix got munged when an updated mailman rpm was loaded on the box. But it should have still queued the message.
I'll see if anything goes down today. Probably will... -----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: Sunday, March 20, 2005 7:23 AM To: hobbit at hswn.dk Subject: Re: [hobbit] alerts still not alerting
On Sat, Mar 19, 2005 at 10:33:09AM -0600, Daniel J McDonald wrote:
I'm still flummoxed by hobbit-alerts. I'm certain I broke something, because I am not getting any alerts from the box.
It's probably a config error ...
The only logs in /var/log/hobbit/page.log are 2005-03-11 07:49:30 Tried to down BOARDBUSY: Invalid argument 2005-03-14 17:24:21 Tried to down BOARDBUSY: Invalid argument
These are harmless, and often occur when Hobbit is shutdown or restarted.
I see a couple of those in the hobbitlaunch.log file as well, I also see the following error: 2005-03-19 10:14:21 Task bbdisplay started with PID 7417 2005-03-19 10:14:21 Task bbretest started with PID 7418 2005-03-19 10:14:29 Our child has failed and will not talk to us 2005-03-19 10:14:36 Our child has failed and will not talk to us
That's a first - and you're right it should be more detailed in the error-message. I've fixed that. But it generally means that one of the hobbitd helper tasks has stopped responding.
Here is a sample host that is not paging. The info page lists: Service Recipient 1st Delay Stop after Repeat Time of Day Colors conn dan.mcdonald at austinenergy.com (R) 30m - 5d - red telnet dan.mcdonald at austinenergy.com (R) 30m - 5d - red
Both telnet and conn have been down on this host for over two hours.
The salient rule is: HOST=%. MAIL=dan.mcdonald at austinenergy.com REPEAT=140h DURATION>30m RECOVERED COLOR="red" UNMATCHED
Your "HOST=" is wrong - it will only match hostnames with exactly one letter (do you really have a host named "a" ?) - if you want to match all hosts, then it's "HOST=%.*" or the simple form "HOST=*"
So some other rule must be generating the info-column output you have, and therefore even if your HOST entry was correct, the rule would not trigger because of the UNMATCHED restriction.
Could you try running
exec ~hobbit/server/bin/bbcmd hobbitd_alert --test HOSTNAME conn "" 120 red
That should tell you how the alert is handled, and who gets notified using what rules.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk