system reboot email alert
Hello All!
Times have changed a bit in the infrastructure environment such that many have a large contingent of virtual machines and blades. These hosts tend to reboot rather quickly, hence connectivity failures tend to not get noticed. I have a request on the table to send an email alert with a specific subject line to indicate that a host has rebooted.
I've seen a few posts over the years regarding requests to alert for system reboots. I also saw a post or two about adding a dynamic column capability. Some responses suggested an external script to accomplish the task of emailing an alert.
I'm asking the question again because as I peruse the analysis.cfg and review the rules, I know I can issue an email alert for PROC and DISK. That begs the question of 'why can't I issue an email alert for UP'? I'm fine with the 'yellow' on CPU test for recent reboot, but since it also includes load changes, I'm not interested in a generic email for either one when they occur.
If I already can, how??
And if not, suggestions for a simple approach? I'm not exactly getting the desired results server-wide with an external script that should just send an email for a CPU status color=yellow without interfering with our other configured alerts. :(
Is someone already successfully conquering this task?
NOTICE OF CONFIDENTIALITY: This message and any attachments contains confidential information belonging to the sender intended only for the use of the individual or entity named above. If you are not the intended recipient, be advised that copying, disclosure or reliance upon the contents is strictly prohibited. If you have received this message in error please notify the sender immediately.
This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com
Yes...but not with Xymon. I have a script that runs on startup that sends an email to alert me that the server rebooted with instructions for what to do (if needed). We used to have some services that required user specific ssh-agents to be running so this was written to let those teams know to log back in and restart the agents. Basically (from memory):
#!/bin/sh
chkconfig: 35 88
description: Reboot message
SENDMAIL="/usr/sbin/sendmail"
HOST=hostname
NOW=date
MAILTO="mail_me at example.com mail_me2 at example.com"
cat <<EOF | $SENDMAIL $MAILTO Subject: $HOST Rebooted! To: $MAILTO Content-Type: text/plain
ATTENTION! Reboot of $HOST has taken place.
EOF
I'm sure there are better ways and this could be more fleshed out but you will know immediately that a machine restarted.
=G=
From: Xymon <xymon-bounces at xymon.com> on behalf of Bauer-Lee, Sue <Sue.Bauer-Lee at Multiplan.com> Sent: Thursday, July 10, 2014 11:24 AM To: xymon at xymon.com Subject: [Xymon] system reboot email alert
Hello All!
Times have changed a bit in the infrastructure environment such that many have a large contingent of virtual machines and blades. These hosts tend to reboot rather quickly, hence connectivity failures tend to not get noticed. I have a request on the table to send an email alert with a specific subject line to indicate that a host has rebooted.
I've seen a few posts over the years regarding requests to alert for system reboots. I also saw a post or two about adding a dynamic column capability. Some responses suggested an external script to accomplish the task of emailing an alert.
I'm asking the question again because as I peruse the analysis.cfg and review the rules, I know I can issue an email alert for PROC and DISK. That begs the question of 'why can't I issue an email alert for UP'? I'm fine with the 'yellow' on CPU test for recent reboot, but since it also includes load changes, I'm not interested in a generic email for either one when they occur.
If I already can, how??
And if not, suggestions for a simple approach? I'm not exactly getting the desired results server-wide with an external script that should just send an email for a CPU status color=yellow without interfering with our other configured alerts. :(
Is someone already successfully conquering this task?
This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com
NOTICE OF CONFIDENTIALITY: This message and any attachments contains confidential information belonging to the sender intended only for the use of the individual or entity named above. If you are not the intended recipient, be advised that copying, disclosure or reliance upon the contents is strictly prohibited. If you have received this message in error please notify the sender immediately.
Thanks so much. That kind of script I can write and will certainly accomplish the email alert issue.
I'd much prefer to use xymon: it knows the host rebooted so my question was directed at why we can't isolate that alert to report it via email/sms/whatever outside of JUST a column color change in the web interface .....
From: Galen Johnson [mailto:Galen.Johnson at sas.com] Sent: Thursday, July 10, 2014 1:40 PM To: Bauer-Lee, Sue; xymon at xymon.com Subject: RE: system reboot email alert
Yes...but not with Xymon. I have a script that runs on startup that sends an email to alert me that the server rebooted with instructions for what to do (if needed). We used to have some services that required user specific ssh-agents to be running so this was written to let those teams know to log back in and restart the agents. Basically (from memory):
#!/bin/sh
chkconfig: 35 88
description: Reboot message
SENDMAIL="/usr/sbin/sendmail"
HOST=hostname
NOW=date
MAILTO="mail_me at example.com mail_me2 at example.com<mailto:mail_me at example.com%20mail_me2 at example.com>"
cat <<EOF | $SENDMAIL $MAILTO Subject: $HOST Rebooted! To: $MAILTO Content-Type: text/plain
ATTENTION! Reboot of $HOST has taken place.
EOF
I'm sure there are better ways and this could be more fleshed out but you will know immediately that a machine restarted.
=G=
From: Xymon <xymon-bounces at xymon.com<mailto:xymon-bounces at xymon.com>> on behalf of Bauer-Lee, Sue <Sue.Bauer-Lee at Multiplan.com<mailto:Sue.Bauer-Lee at Multiplan.com>> Sent: Thursday, July 10, 2014 11:24 AM To: xymon at xymon.com<mailto:xymon at xymon.com> Subject: [Xymon] system reboot email alert
Hello All!
Times have changed a bit in the infrastructure environment such that many have a large contingent of virtual machines and blades. These hosts tend to reboot rather quickly, hence connectivity failures tend to not get noticed. I have a request on the table to send an email alert with a specific subject line to indicate that a host has rebooted.
I've seen a few posts over the years regarding requests to alert for system reboots. I also saw a post or two about adding a dynamic column capability. Some responses suggested an external script to accomplish the task of emailing an alert.
I'm asking the question again because as I peruse the analysis.cfg and review the rules, I know I can issue an email alert for PROC and DISK. That begs the question of 'why can't I issue an email alert for UP'? I'm fine with the 'yellow' on CPU test for recent reboot, but since it also includes load changes, I'm not interested in a generic email for either one when they occur.
If I already can, how??
And if not, suggestions for a simple approach? I'm not exactly getting the desired results server-wide with an external script that should just send an email for a CPU status color=yellow without interfering with our other configured alerts. :(
Is someone already successfully conquering this task?
This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com
NOTICE OF CONFIDENTIALITY: This message and any attachments contains confidential information belonging to the sender intended only for the use of the individual or entity named above. If you are not the intended recipient, be advised that copying, disclosure or reliance upon the contents is strictly prohibited. If you have received this message in error please notify the sender immediately.
This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com
NOTICE OF CONFIDENTIALITY: This message and any attachments contains confidential information belonging to the sender intended only for the use of the individual or entity named above. If you are not the intended recipient, be advised that copying, disclosure or reliance upon the contents is strictly prohibited. If you have received this message in error please notify the sender immediately.
This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com
On Thu, Jul 10, 2014 at 11:24 AM, Bauer-Lee, Sue < Sue.Bauer-Lee at multiplan.com> wrote:
And if not, suggestions for a simple approach? I’m not exactly getting the desired results server-wide with an external script that should just send an email for a CPU status color=yellow without interfering with our other configured alerts. L
You could right a server side extension script based on the [uptime] section of clientdata
-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?
On Fri, Jul 11, 2014 at 2:20 PM, Asif Iqbal <vadud3 at gmail.com> wrote:
You could right a server side extension script based on the [uptime] section of clientdata
You could write..
-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?
On Fri, July 11, 2014 11:20 am, Asif Iqbal wrote:
On Thu, Jul 10, 2014 at 11:24 AM, Bauer-Lee, Sue < Sue.Bauer-Lee at multiplan.com> wrote:
And if not, suggestions for a simple approach? Iâm not exactly getting the desired results server-wide with an external script that should just send an email for a CPU status color=yellow without interfering with our other configured alerts. L
You could right a server side extension script based on the [uptime] section of clientdata
Yeah, the state model does tend to break down when trying to interpret one-time events like this (monitoring systems tend to fall into one camp or the other philosophically).
Getting the data in is easy enough: either parsing uptime, or cat-ing in /proc/uptime to get it in a single value, the question is whether it's especially useful to "waste" an entire status column in memory just for this.
The one spot in xymon that does work around event-based tracking is the log file parser or 'msgs' test, which converts a point-in-time event into a 6x(runtime)-long stateful alert. As a quick hack, you could look for strings that occur in the log file only on a (normal) bootup and trigger those as a critical alert:
Something like...
analysis.cfg: LOG /var/log/messages "kernel: bootmap" COLOR=red GROUP=hostrestart LOG /var/log/messages "%(?-i)syslog.+start" COLOR=red GROUP=hostrestart
alerts.cfg SERVICE=msgs COLOR=red GROUP=hostrestart MAIL me at example.com
This is... totally untested, but I think it should work.
Longer-term, yeah it would be nice to have a single "fake" status column that would accept non-stateful event alerts (or modify's) and passes them through. It might make integration with other alerting systems easier.
HTH, -jc
On Fri, Jul 11, 2014 at 4:14 PM, J.C. Cleaver <cleaver at terabithia.org> wrote:
Getting the data in is easy enough: either parsing uptime, or cat-ing in /proc/uptime to get it in a single value, the question is whether it's especially useful to "waste" an entire status column in memory just for this.
[uptime] is already in the clientdata as part of default install.
-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?
Check out the UP parameter/setting in the analysis.cfg. ( https://www.xymon.com/help/manpages/man5/analysis.cfg.5.html):
*UP bootlimit toolonglimit [color]*
The cpu status goes yellow/red if the system has been up for less than "bootlimit" time, or longer than "toolonglimit". The time is in minutes, or you can add h/d/w for hours/days/weeks - eg. "2h" for two hours, or "4w" for 4 weeks.
Defaults: bootlimit=1h, toolonglimit=-1 (infinite), color=yellow.
So, you could add "UP 30m -1 RED" to either the DEFAULT stanza or to select hosts. The CPU test will then have a "&red Machine recently rebooted" at the top (similar to this <https://www.xymon.com/xymon-cgi/historylog.sh?HOST=brahms.hswn.dk&SERVICE=cpu&TIMEBUF=Mon_Jul_7_22:59:08_2014>) when the host's uptime <= 30m.
Robert Herron robert.herron at gmail.com
On Thu, Jul 10, 2014 at 11:24 AM, Bauer-Lee, Sue < Sue.Bauer-Lee at multiplan.com> wrote:
Hello All!
Times have changed a bit in the infrastructure environment such that many have a large contingent of virtual machines and blades. These hosts tend to reboot rather quickly, hence connectivity failures tend to not get noticed. I have a request on the table to send an email alert with a specific subject line to indicate that a host has rebooted.
I’ve seen a few posts over the years regarding requests to alert for system reboots. I also saw a post or two about adding a dynamic column capability. Some responses suggested an external script to accomplish the task of emailing an alert.
I’m asking the question again because as I peruse the analysis.cfg and review the rules, I know I can issue an email alert for PROC and DISK.
That begs the question of ‘why can’t I issue an email alert for UP’? I’m fine with the ‘yellow’ on CPU test for recent reboot, but since it also includes load changes, I’m not interested in a generic email for either one when they occur.
If I already can, how??
And if not, suggestions for a simple approach? I’m not exactly getting the desired results server-wide with an external script that should just send an email for a CPU status color=yellow without interfering with our other configured alerts. L
Is someone already successfully conquering this task?
This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com
*NOTICE OF CONFIDENTIALITY: This message and any attachments contains confidential information belonging to the sender intended only for the use of the individual or entity named above. If you are not the intended recipient, be advised that copying, disclosure or reliance upon the contents is strictly prohibited. If you have received this message in error please notify the sender immediately. *
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
And the entry for alerts.cfg to send an email notification?
From: Robert Herron [mailto:robert.herron at gmail.com] Sent: Saturday, July 12, 2014 8:17 AM To: Bauer-Lee, Sue Cc: xymon at xymon.com Subject: Re: [Xymon] system reboot email alert
Check out the UP parameter/setting in the analysis.cfg. (https://www.xymon.com/help/manpages/man5/analysis.cfg.5.html):
UP bootlimit toolonglimit [color]
The cpu status goes yellow/red if the system has been up for less than "bootlimit" time, or longer than "toolonglimit". The time is in minutes, or you can add h/d/w for hours/days/weeks - eg. "2h" for two hours, or "4w" for 4 weeks.
Defaults: bootlimit=1h, toolonglimit=-1 (infinite), color=yellow.
So, you could add "UP 30m -1 RED" to either the DEFAULT stanza or to select hosts. The CPU test will then have a "&red Machine recently rebooted" at the top (similar to this<https://www.xymon.com/xymon-cgi/historylog.sh?HOST=brahms.hswn.dk&SERVICE=cpu&TIMEBUF=Mon_Jul_7_22:59:08_2014>) when the host's uptime <= 30m.
Robert Herron robert.herron at gmail.com<mailto:robert.herron at gmail.com>
On Thu, Jul 10, 2014 at 11:24 AM, Bauer-Lee, Sue <Sue.Bauer-Lee at multiplan.com<mailto:Sue.Bauer-Lee at multiplan.com>> wrote: Hello All!
Times have changed a bit in the infrastructure environment such that many have a large contingent of virtual machines and blades. These hosts tend to reboot rather quickly, hence connectivity failures tend to not get noticed. I have a request on the table to send an email alert with a specific subject line to indicate that a host has rebooted.
I’ve seen a few posts over the years regarding requests to alert for system reboots. I also saw a post or two about adding a dynamic column capability. Some responses suggested an external script to accomplish the task of emailing an alert.
I’m asking the question again because as I peruse the analysis.cfg and review the rules, I know I can issue an email alert for PROC and DISK. That begs the question of ‘why can’t I issue an email alert for UP’? I’m fine with the ‘yellow’ on CPU test for recent reboot, but since it also includes load changes, I’m not interested in a generic email for either one when they occur.
If I already can, how??
And if not, suggestions for a simple approach? I’m not exactly getting the desired results server-wide with an external script that should just send an email for a CPU status color=yellow without interfering with our other configured alerts. ☹
Is someone already successfully conquering this task?
This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com
NOTICE OF CONFIDENTIALITY: This message and any attachments contains confidential information belonging to the sender intended only for the use of the individual or entity named above. If you are not the intended recipient, be advised that copying, disclosure or reliance upon the contents is strictly prohibited. If you have received this message in error please notify the sender immediately.
Xymon mailing list Xymon at xymon.com<mailto:Xymon at xymon.com> http://lists.xymon.com/mailman/listinfo/xymon
This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com
NOTICE OF CONFIDENTIALITY: This message and any attachments contains confidential information belonging to the sender intended only for the use of the individual or entity named above. If you are not the intended recipient, be advised that copying, disclosure or reliance upon the contents is strictly prohibited. If you have received this message in error please notify the sender immediately.
This email has been scanned by the Symantec Email Security.cloud service. For more information please visit http://www.symanteccloud.com
participants (5)
-
cleaver@terabithia.org
-
Galen.Johnson@sas.com
-
robert.herron@gmail.com
-
Sue.Bauer-Lee@Multiplan.com
-
vadud3@gmail.com