Hi All,
I seem to have a bit of a problem with a large amount of devices reporting http flapping. I am monitoring in hosts.cfg https://[ip_address]:444. The SSL column always remains green, but the http column flaps with the status message:
https://[ip_address]:444/ - SSL error
Seconds: 5.08
Is there anyway the time that the http test awaits a response for can be increased, or is there any other way to prevent the flapping/red status? I have 90 such devices that consume a lot of RAM and the CPU regularly goes up to 90%+ (I collect SNMP stats via devmon so the flapping here can be controlled with thresholds). I presume this is why the http response can be slow, though it would normally return in 8s. Hence the question, can the http reponse time to Xymon be increased?
Cheers, Phil
Hi All,
OK I realise now that this doesn't have anything to do with the https test time; I've tried increasing the the test time directly via xymonnet --timeout; and the test still flaps with the same frequency. The main issue seems to be the somewhat non-descriptive "SSL error" message. The sslcert check does not go red as the http check fails. The http check fails at different times; some hosts go red after 5 seconds, others up to 9 seconds. I've checked through all the xymon logs and cannot find anything relating to this error. Anyone got any ideas?
Cheers, Phil
---------- Forwarded message ---------- From: Phil Meech <pmeech at gmail.com> Date: 12 May 2011 11:01 Subject: Xymon 4.3.2 http flapping with SSL error To: xymon at xymon.com
Hi All,
I seem to have a bit of a problem with a large amount of devices reporting http flapping. I am monitoring in hosts.cfg https://[ip_address]:444. The SSL column always remains green, but the http column flaps with the status message:
https://[ip_address]:444/ - SSL error
Seconds: 5.08
Is there anyway the time that the http test awaits a response for can be increased, or is there any other way to prevent the flapping/red status? I have 90 such devices that consume a lot of RAM and the CPU regularly goes up to 90%+ (I collect SNMP stats via devmon so the flapping here can be controlled with thresholds). I presume this is why the http response can be slow, though it would normally return in 8s. Hence the question, can the http reponse time to Xymon be increased?
Cheers, Phil
I seem to have a bit of a problem with a large amount of devices reporting http flapping. I am monitoring in hosts.cfg https://[ip_address]:444. The SSL column always remains green, but the http column flaps with the status message:
https://[ip_address]:444/ - SSL error
This usually means that connecting to the service failed. There might be some additional information in the xymonnet logfile.
The "sslcert" column (I assume that is the one you mean) only reports the status of the SSL certificate; it it cannot connect to the server then there is no certificate to report on, and hence that status is not updated (and will eventually go purple).
status? I have 90 such devices that consume a lot of RAM and the CPU regularly goes up to 90%+ (I collect SNMP stats via devmon so the flapping here can be controlled with thresholds). I presume this is why the http response can be slow, though it would normally return in 8s.
If increasing the timeout doesn't help, perhaps you can use something like "badhttp:3:3:5" to delay alerting until the test has failed for more than one poll cycle ? This example would cause it to go yellow for 3 poll cycles (15 minutes) and red after 5 cycles (25 minutes).
Regards, Henrik
Hi Henrick,
Thanks for your input; it has solved my issue!
Cheers, Phil
On 23 May 2011 15:23, Henrik Størner <henrik at hswn.dk> wrote:
I seem to have a bit of a problem with a large amount of devices reporting http flapping. I am monitoring in hosts.cfg https://[ip_address]:444. The SSL column always remains green, but the http column flaps with the status message:
https://[ip_address]:444/ - SSL error
This usually means that connecting to the service failed. There might be some additional information in the xymonnet logfile.
The "sslcert" column (I assume that is the one you mean) only reports the status of the SSL certificate; it it cannot connect to the server then there is no certificate to report on, and hence that status is not updated (and will eventually go purple).
status? I have 90 such devices that consume a lot of RAM and the CPU regularly goes up to 90%+ (I collect SNMP stats via devmon so the flapping here can be controlled with thresholds). I presume this is why the http response can be slow, though it would normally return in 8s.
If increasing the timeout doesn't help, perhaps you can use something like "badhttp:3:3:5" to delay alerting until the test has failed for more than one poll cycle ? This example would cause it to go yellow for 3 poll cycles (15 minutes) and red after 5 cycles (25 minutes).
Regards, Henrik
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
participants (2)
-
henrik@hswn.dk
-
pmeech@gmail.com