Green status during a total blackout
Hi,
My Xymon server (running Hobbit 4.2.2) has been shut down during 6 hours. When the power came back, Xymon came back too. I've checked all monitored equipment, everything was green during the past 6h. I was expecting purple or white color because no data has been reported during the time. Is that a bug or a setting I could adjust ?
Thanks.
It won't turn purple until it runs again, about 5 minutes. That is, if it doesn't obtain data that is more than 30 minutes old before it runs. On Jul 25, 2011 5:45 AM, "L.M.J" <linuxmasterjedi at free.fr> wrote:
Hi,
My Xymon server (running Hobbit 4.2.2) has been shut down during 6 hours. When the power came back, Xymon came back too. I've checked all monitored equipment, everything was green during the past 6h. I was expecting purple or white color because no data has been reported during the time. Is that a bug or a setting I could adjust ?
Thanks.
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
Le Mon, 25 Jul 2011 11:28:30 +0200, "L.M.J" <linuxmasterjedi at free.fr> a écrit :
When the power came back, Xymon came back too. I've checked all monitored equipment, everything was green during the past 6h. I was expecting purple or white color because no data has been reported during the time. Is that a bug or a setting I could adjust ?
Hi,
Nobody had this issue ? My chief asked me when the power outage arrived, I could not told him : everything stay green all the time :-/
-- LMJ "May the source be with you my young padawan" http://sites.google.com/site/imatruelinuxmasterjedi/
On 27/07/11 2:45 PM, L.M.J wrote:
Le Mon, 25 Jul 2011 11:28:30 +0200, "L.M.J" <linuxmasterjedi at free.fr> a écrit :
When the power came back, Xymon came back too. I've checked all monitored equipment, everything was green during the past 6h. I was expecting purple or white color because no data has been reported during the time. Is that a bug or a setting I could adjust ? Hi,
Nobody had this issue ? My chief asked me when the power outage arrived, I could not told him : everything stay green all the time :-/
Status colour has to change explicitly - i.e. by notification of status change. If the server died when the power went off, and came back when everything was back again it would have no reason to show anything had changed. Purple only happens when a display update has expired information. If xymon was running the whole time (e.g. on a UPS) you shouldn't see all green.
David.
-- David Baldwin - Assistant Director, Infrastructure (acting) Information and Communication Technology Services Australian Sports Commission http://ausport.gov.au Tel 02 62147830 Fax 02 62141830 PO Box 176 Belconnen ACT 2616 david.baldwin at ausport.gov.au Leverrier Street Bruce ACT 2617
Keep up to date with what's happening in Australian sport visit http://www.ausport.gov.au
This message is intended for the addressee named and may contain confidential and privileged information. If you are not the intended recipient please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited and may be unlawful. If you receive this message in error, please delete it and notify the sender.
When you had the outage, did your Xymon server go down too? If it did, then what you are seeing makes sense.
Xymon looks for status changes. If everything went down, including Xymon, then it would never have received any messages, nor been able to perform any other tests, like ping etc.
I am guessing that when power was restored, you brought up all your other servers first, and then started Xymon. Or they all started up around the same time.
When it came back up, it would have assigned the last known status to each server. Last known status of all servers was green. At this point, all pings and other tests would return green, so there is no status change to report. So no errors. (At worst, I would expect your CPU columns to go yellow, for an hour, with the "Machine recently rebooted" message)
The only way to prevent this in future is to make sure your Xymon server is on a UPS. Better still, on a different UPS to the rest of your kit. That way, it will stay up, and happily report that everything else had gone to hell in a handbasket.
If your manager nees it, you can probably get an idea of the start-end time of the outage from your messages files and other logs. They will all indicate a boot message at around the time of the power recovery. By checking the last timestamp before the recovery, will give you an idea of the outage start time. Check a few servers, the latest possible value is the correct one.
The root cause of your issue, or rather the lack of issue, is that Xymon never recieved any reports of problems. Because it couldn't.
Hope that helps.
Regards Vernon
On 27 July 2011 12:45, L.M.J <linuxmasterjedi at free.fr> wrote:
Le Mon, 25 Jul 2011 11:28:30 +0200, "L.M.J" <linuxmasterjedi at free.fr> a écrit :
When the power came back, Xymon came back too. I've checked all monitored equipment, everything was green during the past 6h. I was expecting purple or white color because no data has been reported during the time. Is that a bug or a setting I could adjust ?
Hi,
Nobody had this issue ? My chief asked me when the power outage arrived, I could not told him : everything stay green all the time :-/
-- LMJ "May the source be with you my young padawan" http://sites.google.com/site/imatruelinuxmasterjedi/
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
On Wed, 27 Jul 2011 14:00:44 +0800, Vernon Everett wrote:
When you had the outage, did your Xymon server go down too? If it did, then what you are seeing makes sense.
Xymon looks for status changes. If everything went down, including Xymon, then it would never have received any messages, nor been able to perform any other tests, like ping etc.
I am guessing that when power was restored, you brought up all your other servers first, and then started Xymon. Or they all started up around the same time.
The root cause of your issue, or rather the lack of issue, is that Xymon never recieved any reports of problems. Because it couldn't.
Yes, it's exactly what happened and no one is shocked about this behaviour ?
I would expect, at least, a white stripe in the "History" page of each status check. Nothing has been reported during 6h, this is never happens in a every day functioning. Type or paste your English text here and click on the "Check Text" button.
On 27 July 2011 12:45, L.M.J <linuxmasterjedi at free.fr> wrote:
Le Mon, 25 Jul 2011 11:28:30 +0200, "L.M.J" <linuxmasterjedi at free.fr> a écrit :
When the power came back, Xymon came back too. I've checked all monitored equipment, everything was green during the past 6h. I was expecting purple or white color because no data has been reported during the time. Is that a bug or a setting I could adjust ?
If your Xymon server was down how could it recognise that it had not received any data?
If it received data within 5 minutes of it restarting then I would fully expect it to show green for the intervening period. I don't think Xymon can recognise that it was itself down and therefore show white/purple in the history. The timer that causes these statuses would have been reset by the reboot which would mean that as long as it then receives data from the clients within the new refreshed timer period it would stay green.
The only way I can think of that would allow you to see when the power outage occurred on Xymon would be to have your Xymon server attached to a UPS that would keep it alive for at least 10 minutes.
-----Original Message----- From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of L.M.J Sent: 27 July 2011 11:46 To: Vernon Everett Cc: xymon at xymon.com Subject: Re: [Xymon] Green status during a total blackout
On Wed, 27 Jul 2011 14:00:44 +0800, Vernon Everett wrote:
When you had the outage, did your Xymon server go down too? If it did, then what you are seeing makes sense.
Xymon looks for status changes. If everything went down, including Xymon, then it would never have received any messages, nor been able to perform any other tests, like ping etc.
I am guessing that when power was restored, you brought up all your other servers first, and then started Xymon. Or they all started up around the same time.
The root cause of your issue, or rather the lack of issue, is that Xymon never recieved any reports of problems. Because it couldn't.
Yes, it's exactly what happened and no one is shocked about this behaviour ?
I would expect, at least, a white stripe in the "History" page of each status check. Nothing has been reported during 6h, this is never happens in a every day functioning. Type or paste your English text here and click on the "Check Text" button.
On 27 July 2011 12:45, L.M.J <linuxmasterjedi at free.fr> wrote:
Le Mon, 25 Jul 2011 11:28:30 +0200, "L.M.J" <linuxmasterjedi at free.fr> a écrit :
When the power came back, Xymon came back too. I've checked all monitored equipment, everything was green during the past 6h. I was expecting purple or white color because no data has been reported during the time. Is that a bug or a setting I could adjust ?
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
Name & Registered Office: EXPRESS GIFTS LIMITED, 2 GREGORY ST, HYDE, CHESHIRE, ENGLAND, SK14 4TH, Company No. 00718151. Express Gifts Limited is authorised and regulated by the Financial Services Authority
NOTE: This email and any information contained within or attached in a separate file is confidential and intended solely for the Individual to whom it is addressed. The information or data included is solely for the purpose indicated or previously agreed. Any information or data included with this e-mail remains the property of Findel PLC and the recipient will refrain from utilising the information for any purpose other than that indicated and upon request will destroy the information and remove it from their records. Any views or opinions presented are solely those of the author and do not necessarily represent those of Findel PLC. If you are not the intended recipient, be advised that you have received this email in error and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. No warranties or assurances are made in relation to the safety and content of this e-mail and any attachments. No liability is accepted for any consequences arising from it. Findel Plc reserves the right to monitor all e-mail communications through its internal and external networks. If you have received this email in error please notify our IT helpdesk on +44(0) 1254 303030
On Wed, 27 Jul 2011 11:54:04 +0100, Neil Simmonds wrote:
If your Xymon server was down how could it recognise that it had not received any data?
Yes, but when it restarted, can't it see any data/graphs has been update for hours ? What about updating rrd graphs with a 6h hole ? It turns purple when no data has been reported since 30min ? Xymon turns itself to purple without having notification, doesn't it ?
BTW : I'm already on UPS, but it ran out of power :-/ Also, Hobbit starts after all other servers due to server-side dependencies.
On 27-07-2011 13:05, L.M.J wrote:
On Wed, 27 Jul 2011 11:54:04 +0100, Neil Simmonds wrote:
If your Xymon server was down how could it recognise that it had not received any data?
Yes, but when it restarted, can't it see any data/graphs has been update for hours ? What about updating rrd graphs with a 6h hole ? It turns purple when no data has been reported since 30min ? Xymon turns itself to purple without having notification, doesn't it ?
You have a valid point, but Xymon is - by design - essentially event-driven when it comes to updates: It won't update anything unless it receives some sort of notification that something has changed. And when it is down, it doesn't receive anything.
Yes, Xymon could look at the RRD files and see that they haven't been updated. There might be other reasons for that, though - e.g. the xymond_rrd module might have crashed, but the rest of Xymon has been running fine.
And what happens if all your tests go purple immediately when Xymon is brought back on-line ? You'll potentially fire off all the alerts you have configured. At a time when things are really running OK...
So it's a design choice to behave the way it does. If your entire datacenter has been down because of a power outage, people will know. They don't need the Xymon history to learn about that.
Regards, Henrik
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I think he does have a point here... couldn't it retroactively, when drawing up history and graphs, say "there's a big hole here where nothing was erased?" The way Xymon so far processes information might make this a non-trivial task, however -- don't know.
On 07/27/2011 06:54 AM, Neil Simmonds wrote:
If your Xymon server was down how could it recognise that it had not received any data?
If it received data within 5 minutes of it restarting then I would fully expect it to show green for the intervening period. I don't think Xymon can recognise that it was itself down and therefore show white/purple in the history. The timer that causes these statuses would have been reset by the reboot which would mean that as long as it then receives data from the clients within the new refreshed timer period it would stay green.
The only way I can think of that would allow you to see when the power outage occurred on Xymon would be to have your Xymon server attached to a UPS that would keep it alive for at least 10 minutes.
-----Original Message----- From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of L.M.J Sent: 27 July 2011 11:46 To: Vernon Everett Cc: xymon at xymon.com Subject: Re: [Xymon] Green status during a total blackout
On Wed, 27 Jul 2011 14:00:44 +0800, Vernon Everett wrote:
When you had the outage, did your Xymon server go down too? If it did, then what you are seeing makes sense.
Xymon looks for status changes. If everything went down, including Xymon, then it would never have received any messages, nor been able to perform any other tests, like ping etc.
I am guessing that when power was restored, you brought up all your other servers first, and then started Xymon. Or they all started up around the same time.
The root cause of your issue, or rather the lack of issue, is that Xymon never recieved any reports of problems. Because it couldn't.
Yes, it's exactly what happened and no one is shocked about this behaviour ?
I would expect, at least, a white stripe in the "History" page of each status check. Nothing has been reported during 6h, this is never happens in a every day functioning. Type or paste your English text here and click on the "Check Text" button.
On 27 July 2011 12:45, L.M.J <linuxmasterjedi at free.fr> wrote:
Le Mon, 25 Jul 2011 11:28:30 +0200, "L.M.J" <linuxmasterjedi at free.fr> a écrit :
When the power came back, Xymon came back too. I've checked all monitored equipment, everything was green during the past 6h. I was expecting purple or white color because no data has been reported during the time. Is that a bug or a setting I could adjust ?
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
Name & Registered Office: EXPRESS GIFTS LIMITED, 2 GREGORY ST, HYDE, CHESHIRE, ENGLAND, SK14 4TH, Company No. 00718151. Express Gifts Limited is authorised and regulated by the Financial Services Authority
NOTE: This email and any information contained within or attached in a separate file is confidential and intended solely for the Individual to whom it is addressed. The information or data included is solely for the purpose indicated or previously agreed. Any information or data included with this e-mail remains the property of Findel PLC and the recipient will refrain from utilising the information for any purpose other than that indicated and upon request will destroy the information and remove it from their records. Any views or opinions presented are solely those of the author and do not necessarily represent those of Findel PLC. If you are not the intended recipient, be advised that you have received this email in error and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. No warranties or assurances are made in relation to the safety and content of this e-mail and any attachments. No liability is accepted for any co nsequences arising from it. Findel Plc reserves the right to monitor all e-mail communications through its internal and external networks. If you have received this email in error please notify our IT helpdesk on +44(0) 1254 303030
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
- ---- _ _ _ _ ___ _ _ _ |Y#| | | |\/| | \ |\ | | |Ryan Novosielski - Sr. Systems Programmer |$&| |__| | | |__/ | \| _| |novosirj at umdnj.edu - 973/972.0922 (2-0922) \__/ Univ. of Med. and Dent.|IST/CST-Academic Svcs. - ADMC 450, Newark -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk4y26wACgkQmb+gadEcsb4VyACgnErBwoS6A0/bcvoHOzYhq/9Q W7AAn3UUexXGw2IOOA1MA8Xc2wyggDoo =CLMP -----END PGP SIGNATURE-----
On 27/07/11 11:45, L.M.J wrote:
On Wed, 27 Jul 2011 14:00:44 +0800, Vernon Everett wrote:
When you had the outage, did your Xymon server go down too? If it did, then what you are seeing makes sense.
Xymon looks for status changes. If everything went down, including Xymon, then it would never have received any messages, nor been able to perform any other tests, like ping etc.
I am guessing that when power was restored, you brought up all your other servers first, and then started Xymon. Or they all started up around the same time.
The root cause of your issue, or rather the lack of issue, is that Xymon never recieved any reports of problems. Because it couldn't.
Yes, it's exactly what happened and no one is shocked about this behaviour ?
I would expect, at least, a white stripe in the "History" page of each status check. Nothing has been reported during 6h, this is never happens in a every day functioning. Type or paste your English text here and click on the "Check Text" button.
In our case the UPS would alert Xymon its on battery and has lost input power. Then it would start shutting down boxes after a few minutes causing box status reports to change colour and then it would shut down cleanly.
Everything would already be red before its turned back on again.
As part of a clean shutdown though maybe Xymon shouldn't leave stuff green for the duration its turned off. I've never really checked what it does at that point..
Craig
On 27 July 2011 12:45, L.M.J <linuxmasterjedi at free.fr> wrote:
Le Mon, 25 Jul 2011 11:28:30 +0200, "L.M.J" <linuxmasterjedi at free.fr> a écrit :
When the power came back, Xymon came back too. I've checked all monitored equipment, everything was green during the past 6h. I was expecting purple or white color because no data has been reported during the time. Is that a bug or a setting I could adjust ?
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
participants (8)
-
craig_whilding@mentor.com
-
david.baldwin@ausport.gov.au
-
everett.vernon@gmail.com
-
henrik@hswn.dk
-
josh@imaginenetworksllc.com
-
linuxmasterjedi@free.fr
-
Neil.Simmonds@express-gifts.co.uk
-
novosirj@umdnj.edu