Hi,
Yesterday I disabled a ping test 'untill ok'. This night, the ping test failed on the hobbit server (4.2) with status clear: Wed Mar 4 00:05:42 2009 ping ok : System failure of the ping test
Service ping on srvs3tsm is OK Hobbit system error
System unreachable for 1508 poll periods (455337 seconds)
This also cleared the disable setting so 5 minutes later, the status was error and 10 minutes later an alert was triggered.
Is it normal that a 'system failure' of the ping test clears the 'disabled untill ok' setting?
Stef
On Wed, Mar 04, 2009 at 10:27:53AM +0100, Stef Coene wrote:
Yesterday I disabled a ping test 'untill ok'. This night, the ping test failed on the hobbit server (4.2) with status clear: Wed Mar 4 00:05:42 2009 ping ok : System failure of the ping test
Service ping on srvs3tsm is OK Hobbit system error
System unreachable for 1508 poll periods (455337 seconds)
This also cleared the disable setting so 5 minutes later, the status was error and 10 minutes later an alert was triggered.
Is it normal that a 'system failure' of the ping test clears the 'disabled untill ok' setting?
A "system error" of the ping test is definitely NOT normal. Check the network test error log to see why it failed, it shouldn't do that. (Most likely it couldn't create the file where fping or hobbitping stores the results).
Whether a "clear" status should count as OK in the "disabled until OK" sense can be discussed.
Regards, Henrik
On Wednesday 04 March 2009, Henrik Størner wrote:
A "system error" of the ping test is definitely NOT normal. Check the network test error log to see why it failed, it shouldn't do that. (Most likely it couldn't create the file where fping or hobbitping stores the results). I know, I found the culprit: 2009-03-04 00:05:47 Cannot create file /home/users/hobbit/server/tmp/ping-stdout.2056 : Permission denied 2009-03-04 00:05:47 Cannot create file /home/users/hobbit/server/tmp/ping-stderr.2056 : Permission denied 2009-03-04 00:05:50 xgetenv: Cannot find value for variable $BBTMP 2009-03-04 00:05:50 hobbitping child could not create outputfiles in (null) 2009-03-04 00:05:50 Cannot open ping output file /home/users/hobbit/server/tmp/ping-stdout.2056
This is caused by a configure scripts that does a chown root for all hobbit files and changes the required files back to owner hobbit.
Whether a "clear" status should count as OK in the "disabled until OK" sense can be discussed. For me, OK = green. I didn't expected that a clear message would count as OK.
Stef
On Wed, 2009-03-04 at 10:27 +0100, Stef Coene wrote:
Hi,
Yesterday I disabled a ping test 'untill ok'. This night, the ping test failed on the hobbit server (4.2) with status clear: [...] Is it normal that a 'system failure' of the ping test clears the 'disabled untill ok' setting?
if clear is listed in OKCOLORS, then yes, that would be normal.
The default is: xymonserver.cfg:ALERTCOLORS="red,yellow,purple" # Colors that may trigger an alert message xymonserver.cfg:OKCOLORS="green,blue,clear" # Colors that may trigger a recovery message
-- Daniel J McDonald, CCIE #2495, CISSP #78281, CNX Austin Energy http://www.austinenergy.com
On Wednesday 04 March 2009, McDonald, Dan wrote:
On Wed, 2009-03-04 at 10:27 +0100, Stef Coene wrote:
Hi,
Yesterday I disabled a ping test 'untill ok'. This night, the ping test failed on the hobbit server (4.2) with status clear:
[...]
Is it normal that a 'system failure' of the ping test clears the 'disabled untill ok' setting?
if clear is listed in OKCOLORS, then yes, that would be normal.
The default is: xymonserver.cfg:ALERTCOLORS="red,yellow,purple" # Colors that may trigger an alert message xymonserver.cfg:OKCOLORS="green,blue,clear" # Colors that may trigger a recovery message Thx, I will change this to green,blue only.
Stef
-----Oorspronkelijk bericht----- Van: Stef Coene [mailto:stef.coene at docum.org] Verzonden: woensdag 4 maart 2009 15:26 Aan: hobbit at hswn.dk Onderwerp: Re: [hobbit] disable untill ok bug ?
Thx, I will change this to green,blue only.
That will also mean any page with a 'clear' status will no longer remain green... Make sure you want this effect as well...
Your one time faulty ping was obviously not a common problem, and you have found the cause. It is unlikely that you will want to disable a pingtest 'untill ok' more often then you would encounter a non-geen page due to a 'clear' test.
For example, I use the clear status as an in-between before going yellow or red, with the 'badtest' settings. I would not want the page to go grey jyst because one ping was missed, so we have 'badconn:1:3:5' in our host settings.
//Danny.
That will also mean any page with a 'clear' status will no longer remain green... Make sure you want this effect as well... A clear page is still grey, not green. As far as I can check, it just means that if you disabled a test, the clear page will not trigger it to undo the disable, so it will stay disabled.
Stef
-----Oorspronkelijk bericht----- Van: Stef Coene [mailto:stef.coene at docum.org] Verzonden: donderdag 5 maart 2009 10:20 Aan: hobbit at hswn.dk Onderwerp: Re: [hobbit] disable untill ok bug ?
That will also mean any page with a 'clear' status will no longer remain green... Make sure you want this effect as well... A clear page is still grey, not green. As far as I can check, it just means that if you disabled a test, the clear page will not trigger it to undo the disable, so it will stay disabled.
A page full of hosts that is all-green (every dot is green) will display as a green page.
If one dot (for example the 'conn' check on server X) changes to 'clear' (grey, as you wish) the page will remain green.
If later, the dot ('conn' for server X again) changes to yellow, the entire page will also turn yellow (unless there is a 'NOPROPAGATEYELLOW' setting on that check, of course)
This is desired behaviour of hobbit, as the 'clear' setting is marked in the configuration as 'OK' and 'yellow' is not. (see previous posts in this thread) If you change the setting in the configuration for 'clear' to not be 'OK', pages will change color to 'clear' (yes, grey).
This last bit may not be the behaviour the topicstarter want, so I wished to warn him about it.
If you were refering to the seperate page for the actual check (dot) that turns grey, then yes, that will always be grey. As that page (the 'conn' page for server X perhaps?) only refers to the actual check you are displaying, not a page full of hosts.
//Danny.
On Thursday 05 March 2009, Kip, D. - GDI/SNB wrote:
If you change the setting in the configuration for 'clear' to not be 'OK', pages will change color to 'clear' (yes, grey). Are you sure? Because all pages are green, even if there is a clear check.
I admit that I am not sure :)
But a colleague had some time back changed stuff around and we got grey pages as a result. I never bothered to look exactly what he changed and just put back the backup... But I know he was at least changing things in that config file :p
-----Oorspronkelijk bericht----- Van: Stef Coene [mailto:stef.coene at docum.org] Verzonden: donderdag 5 maart 2009 13:26 Aan: hobbit at hswn.dk Onderwerp: Re: [hobbit] disable untill ok bug ?
On Thursday 05 March 2009, Kip, D. - GDI/SNB wrote:
If you change the setting in the configuration for 'clear' to not be 'OK', pages will change color to 'clear' (yes, grey). Are you sure? Because all pages are green, even if there is a clear check. From the config file: OKCOLORS: Colors that may trigger a recovery message In my case this is blue and green. I dont't think this anything to do in determining the color of the pages.
Stef
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
participants (4)
-
d.kip@gdi.minjus.nl
-
Dan.McDonald@austinenergy.com
-
henrik@hswn.dk
-
stef.coene@docum.org