[hobbit] alerts not emailing
Henrik, There is plenty of output from running that command - e.g.: @@page#677|1106919385.676695|10.2.216.252|bku005|disk|10.2.48.244|1106921185 |red|green|1106919385||187413 status bku005.disk red Fri 28 Jan 13:36:25 2005 - Disk on bku005 at PANIC level &red /tmp (95%) has reached the defined disk space PANIC level (95%) /dev/lv00 139264 15136 124128 11% /tsg /dev/aixdoclv 319488 63000 256488 20% /aixdoc /dev/lv10 8192 2116 6076 26% /innogy /dev/hd4 98304 42824 55480 44% / /dev/hd9var 49152 24656 24496 51% /var /dev/lv01 409600 210716 198884 52% /bmc /dev/linuxlv 614400 369204 245196 61% /linux /dev/hd1 98304 60120 38184 62% /home /dev/lv09 524288 392520 131768 75% /maint /dev/lv11 917504 719512 197992 79% /downloads /dev/hd2 1294336 1149268 145068 89% /usr /dev/hd3 28672 27100 1572 95% /tmp @@ But still no alerts are being emailed and the page.log is not being updated. Chris
-----Original Message----- From: Henrik Stoerner [SMTP:henrik at hswn.dk] Sent: Friday, January 28, 2005 1:25 PM To: hobbit at hswn.dk Subject: Re: [hobbit] alerts not emailing
On Fri, Jan 28, 2005 at 12:05:20PM -0000, Morris, Chris (Shared Services) wrote:
On Friday, January 28, 2005 1:54 AM, Bruce Lysik wrote :-
But I fail to get any alerts sent out. I've confirmed that email is working from this machine, and nothing shows up in /var/log/hobbit/page.log. (And nothing relevant to this issue in any of the logs there.)
Any help would be appreciated.
I am having the same problem. A disk exceeds the Alarm threshold and goes red on the hobbit display but hobbitd_alert takes no action to send a mail.
Could you try running the following command (login as the hobbit user):
~/server/bin/bbcmd --env=server/etc/hobbitserver.cfg hobbitd_channel --channel=page cat
Let it run for 5-10 minutes (long any for the critical status to be updated) and let me know if there's any output.
Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
**************************************************************************** The information contained in this email is intended only for the use of the intended recipient at the email address to which it has been addressed. If the reader of this message is not an intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination or copying of the message or associated attachments is strictly prohibited. If you have received this email in error, please contact the sender by return email or call 01793 877777 and ask for the sender and then delete it immediately from your system.Please note that neither RWE npower nor the sender accepts any responsibility for viruses and it is your responsibility to scan attachments (if any). *****************************************************************************
OK, can you mail me your hobbit-alerts.cfg file then ? I'm sure you already checked that hobbitd_alert is running - it's enabled by default so it should be, but still ... Thanks, Henrik On Fri, Jan 28, 2005 at 01:48:41PM -0000, Morris, Chris (Shared Services) wrote:
Henrik,
There is plenty of output from running that command - e.g.:
@@page#677|1106919385.676695|10.2.216.252|bku005|disk|10.2.48.244|1106921185 |red|green|1106919385||187413 status bku005.disk red Fri 28 Jan 13:36:25 2005 - Disk on bku005 at PANIC level &red /tmp (95%) has reached the defined disk space PANIC level (95%)
/dev/lv00 139264 15136 124128 11% /tsg /dev/aixdoclv 319488 63000 256488 20% /aixdoc /dev/lv10 8192 2116 6076 26% /innogy /dev/hd4 98304 42824 55480 44% / /dev/hd9var 49152 24656 24496 51% /var /dev/lv01 409600 210716 198884 52% /bmc /dev/linuxlv 614400 369204 245196 61% /linux /dev/hd1 98304 60120 38184 62% /home /dev/lv09 524288 392520 131768 75% /maint /dev/lv11 917504 719512 197992 79% /downloads /dev/hd2 1294336 1149268 145068 89% /usr /dev/hd3 28672 27100 1572 95% /tmp @@
But still no alerts are being emailed and the page.log is not being updated.
Chris
-----Original Message----- From: Henrik Stoerner [SMTP:henrik at hswn.dk] Sent: Friday, January 28, 2005 1:25 PM To: hobbit at hswn.dk Subject: Re: [hobbit] alerts not emailing
On Fri, Jan 28, 2005 at 12:05:20PM -0000, Morris, Chris (Shared Services) wrote:
On Friday, January 28, 2005 1:54 AM, Bruce Lysik wrote :-
But I fail to get any alerts sent out. I've confirmed that email is working from this machine, and nothing shows up in /var/log/hobbit/page.log. (And nothing relevant to this issue in any of the logs there.)
Any help would be appreciated.
I am having the same problem. A disk exceeds the Alarm threshold and goes red on the hobbit display but hobbitd_alert takes no action to send a mail.
Could you try running the following command (login as the hobbit user):
~/server/bin/bbcmd --env=server/etc/hobbitserver.cfg hobbitd_channel --channel=page cat
Let it run for 5-10 minutes (long any for the critical status to be updated) and let me know if there's any output.
Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
**************************************************************************** The information contained in this email is intended only for the use of the intended recipient at the email address to which it has been addressed. If the reader of this message is not an intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination or copying of the message or associated attachments is strictly prohibited.
If you have received this email in error, please contact the sender by return email or call 01793 877777 and ask for the sender and then delete it immediately from your system.Please note that neither RWE npower nor the sender accepts any responsibility for viruses and it is your responsibility to scan attachments (if any). *****************************************************************************
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Henrik Storner
Count me in on this. I've induced a couple of process failures on a test system and the email alerts aren't coming through. What other info should I provide or look for? Tom -bash-2.05b$ ./bbcmd --env=/home/bb/hobbit/server/etc/hobbitserver.cfg hobbitd_channel --channel=page cat @@page#13|1106927004.323354|152.52.2.252|rfnd204d.nandomedia.com|cpu|0.0.0.0|1106928804|yellow|yellow|1106926704|web6|885572 status rfnd204d,nandomedia,com.cpu yellow Fri Jan 28 10:43:24 EST 2005 up: 5 min, 0 users, 55 procs, load=0.12 Warning: Machine recently rebooted LOAD AVG on rfnd204d,nandomedia,com is 0.12 @@ @@page#14|1106927078.016993|152.52.2.254|radm200p.nandomedia.com|procs|0.0.0.0|1106928878|red|red|1106925877|web1|388890 status radm200p,nandomedia,com.procs red Fri Jan 28 10:44:38 EST 2005 Some processes are in error &red redproc >=1 - not running, requires at least 1 &yellow yellowproc >=1 - not running, requires at least 1 @@ ====================== myhobbit-alert.cfg. (email addresses were changed to protect the innocent): ############################## # Begin Nando Modifications ############################# HOST=radm200p.nandomedia.com MAIL NOSPAM at nandomedia.com SERVICE=proc COLOR=yellow REPEAT=5m MAIL NOSPAM1 at nandomedia.com SERVICE=proc COLOR=red REPEAT=5m -bash-2.05b$
On Fri, Jan 28, 2005 at 10:46:52AM -0500, Tom Georgoulias wrote:
Count me in on this. I've induced a couple of process failures on a test system and the email alerts aren't coming through. What other info should I provide or look for?
The status message says:
status radm200p,nandomedia,com.procs red Fri Jan 28 10:44:38 EST 2005
so it is the "procs" column that is in error.
HOST=radm200p.nandomedia.com MAIL NOSPAM at nandomedia.com SERVICE=proc COLOR=yellow REPEAT=5m MAIL NOSPAM1 at nandomedia.com SERVICE=proc COLOR=red REPEAT=5m
But here you have rules for the "proc" (no "s") column.
If I setup a config with these rules, but SERVICE=procs, your message triggers an alert e-mail.
Me thinks it would be nice to have a "test" option for the alert module, so you can run it with a hostname + testname as input, and it will tell you which rules match, and which rules does not.
Henrik
Henrik Stoerner wrote:
The status message says:
status radm200p,nandomedia,com.procs red Fri Jan 28 10:44:38 EST 2005
so it is the "procs" column that is in error.
HOST=radm200p.nandomedia.com MAIL NOSPAM at nandomedia.com SERVICE=proc COLOR=yellow REPEAT=5m MAIL NOSPAM1 at nandomedia.com SERVICE=proc COLOR=red REPEAT=5m
But here you have rules for the "proc" (no "s") column.
If I setup a config with these rules, but SERVICE=procs, your message triggers an alert e-mail.
Yup, I'm an idiot. Bitten by a typo, once again. Sorry to bother you about that. I like your idea about the test option. That would be a nice troubleshooting feature.
Tom
participants (3)
-
CHRIS.MORRIS@RWEnpower.com
-
henrik@hswn.dk
-
tgeorgoulias@nandomedia.com