Need some help with delayyellow / delayred
Hi,
I need some help understanding delayyellow and delayred in Xymon 4.3.30 compiled from source (not distro).
I've got some equipment that I'm pinging (conn) and checking web (http) on BMCs (Dell iDRAC and Oracle ILOM) and am having a LOT of trouble with very short lived failures. As in fails a test and then the xymonnet (?) re-tries succeed in the next minute.
So I've tried enabling delayyellow and delayred, first with 5 minutes and then with 10 minutes. But I'm still seeing color changes and receiving email notifications.
My hosts.cfg entries look like this:
192.0.2.1 test-client-ilom # NAME:Test-Client-ILOM ssh
delayred=conn:10,ssh:10 delayyellow=conn:10,ssh:10
These are place holders, but they are find & replace type place holders.
Am I incorrect in thinking that delayyellow / delayred will cause the test results for the delay value to be ignored /before/ changing color?
I suppose that I can change from delayyellow / delayred in hosts.cfg and go to their counterpart in the alerts.cfg file and not send the email(s). But I'd rather the page not change colors as we keep a screen open to it and I'd really rather it not false-yellow / false-red to immediately change back to green < 5 minutes later.
I'm sure that I'm missing something and am hoping that someone will help me figure things out and learn.
Thank you and have a good day.
-- Grant. . . . unix || die
Since you are using the delayred and delayyellow options for network tests (conn and ssh), the xymonnet-again.sh script will retest them every minute for up to 30 minutes (see man page for hosts.cfg). So your "delayred=conn:10,ssh:10 delayyellow=conn:10,ssh:10" option should delay changing these tests to red/yellow for up to 10 minutes after the first failure. I believe you have it configured correctly. Is the test perhaps flapping? There is also a noflap option that could be used if that is the case. I have a few systems that I use the delayred/delayyellow options on and they appear to be working as expected, such as this:
0.0.0.0 google.com # ?conn https://google.com/ sni HIDEHTTP delayred=http:10 delayyellow=http:10
Tom
On Fri, Aug 9, 2024 at 9:39 AM Grant Taylor via Xymon <xymon@xymon.com> wrote:
Hi,
I need some help understanding delayyellow and delayred in Xymon 4.3.30 compiled from source (not distro).
I've got some equipment that I'm pinging (conn) and checking web (http) on BMCs (Dell iDRAC and Oracle ILOM) and am having a LOT of trouble with very short lived failures. As in fails a test and then the xymonnet (?) re-tries succeed in the next minute.
So I've tried enabling delayyellow and delayred, first with 5 minutes and then with 10 minutes. But I'm still seeing color changes and receiving email notifications.
My hosts.cfg entries look like this:
192.0.2.1 test-client-ilom # NAME:Test-Client-ILOM sshdelayred=conn:10,ssh:10 delayyellow=conn:10,ssh:10
These are place holders, but they are find & replace type place holders.
Am I incorrect in thinking that delayyellow / delayred will cause the test results for the delay value to be ignored /before/ changing color?
I suppose that I can change from delayyellow / delayred in hosts.cfg and go to their counterpart in the alerts.cfg file and not send the email(s). But I'd rather the page not change colors as we keep a screen open to it and I'd really rather it not false-yellow / false-red to immediately change back to green < 5 minutes later.
I'm sure that I'm missing something and am hoping that someone will help me figure things out and learn.
Thank you and have a good day.
-- Grant. . . . unix || die
Xymon mailing list -- xymon@xymon.com To unsubscribe send an email to xymon-leave@xymon.com
On 8/9/24 11:43 AM, Tom Schmidt wrote:
Since you are using the delayred and delayyellow options for network tests (conn and ssh), the xymonnet-again.sh script will retest them every minute for up to 30 minutes (see man page for hosts.cfg).
ACK
I thought it would re-test every minute for the first 5 minutes, but 30 minutes is cool too. #TIL
So your "delayred=conn:10,ssh:10 delayyellow=conn:10,ssh:10" option should delay changing these tests to red/yellow for up to 10 minutes after the first failure.
That's what I thought and behavior I was trying to achieve.
I believe you have it configured correctly.
Thank you for the 2nd set of eyes.
Is the test perhaps flapping? There is also a noflap option that could be used if that is the case.
No, I don't think so.
It seems like the tests (conn/ping and / or http/https) periodically (once an hour or so for the sake of discussion) fail and Xymon causes the associated column to go red.
It almost always goes green again a minute or two after it went red. Time stamps in alert emails are usually one minute apart. Sometimes they have the same minute or up to three minutes apart.
Test history shows that it was red for < 2 minutes.
It's just older / slower / cantankerous hardware that occasionally burps and fails a test.
I don't care about onsie-twosie tests fails. I care about when it's been failing for 10-15 minutes.
Well ... I prefer no deay<COLOR>. But I'd rather not have color changes for burps on the known problematic systems. -- I hope that makes sense.
I have a few systems that I use the delayred/delayyellow options on and they appear to be working as expected, such as this:
ACK
0.0.0.0 google.com # ?conn https://google.com/ sni HIDEHTTP delayred=http:10 delayyellow=http:10
I'm not sure what the question mark in front of the conn does. I think sni causes the test to use Server Name Indication, which it doesn't do by default. The delayred / delayyellow is what I'm trying to get to work.
I wonder if the syntax isn't correct with the comma separating multiple tests. I'll try the following and see if that improves things:
delayred=conn:10 delayyellow=conn:10
Thank you Tom. :-)
-- Grant. . . . unix || die
The comma format is valid according to the doco. However I've never seen that usage before, so trying separate entries is a good idea.
Regarding the "?":
"By prefixing a test with "?" errors will be reported with a "clear" status instead of red. This is known as a test for a "dialup" service, and allows you to run tests of hosts that are not always online, without getting alarms while they are off-line."
J
On Sat, 10 Aug 2024, 03:36 Grant Taylor via Xymon, <xymon@xymon.com> wrote:
On 8/9/24 11:43 AM, Tom Schmidt wrote:
Since you are using the delayred and delayyellow options for network tests (conn and ssh), the xymonnet-again.sh script will retest them every minute for up to 30 minutes (see man page for hosts.cfg).
ACK
I thought it would re-test every minute for the first 5 minutes, but 30 minutes is cool too. #TIL
So your "delayred=conn:10,ssh:10 delayyellow=conn:10,ssh:10" option should delay changing these tests to red/yellow for up to 10 minutes after the first failure.
That's what I thought and behavior I was trying to achieve.
I believe you have it configured correctly.
Thank you for the 2nd set of eyes.
Is the test perhaps flapping? There is also a noflap option that could be used if that is the case.
No, I don't think so.
It seems like the tests (conn/ping and / or http/https) periodically (once an hour or so for the sake of discussion) fail and Xymon causes the associated column to go red.
It almost always goes green again a minute or two after it went red. Time stamps in alert emails are usually one minute apart. Sometimes they have the same minute or up to three minutes apart.
Test history shows that it was red for < 2 minutes.
It's just older / slower / cantankerous hardware that occasionally burps and fails a test.
I don't care about onsie-twosie tests fails. I care about when it's been failing for 10-15 minutes.
Well ... I prefer no deay<COLOR>. But I'd rather not have color changes for burps on the known problematic systems. -- I hope that makes sense.
I have a few systems that I use the delayred/delayyellow options on and they appear to be working as expected, such as this:
ACK
0.0.0.0 google.com # ?conn https://google.com/ sni HIDEHTTP delayred=http:10 delayyellow=http:10
I'm not sure what the question mark in front of the conn does. I think sni causes the test to use Server Name Indication, which it doesn't do by default. The delayred / delayyellow is what I'm trying to get to work.
I wonder if the syntax isn't correct with the comma separating multiple tests. I'll try the following and see if that improves things:
delayred=conn:10 delayyellow=conn:10Thank you Tom. :-)
-- Grant. . . . unix || die
Xymon mailing list -- xymon@xymon.com To unsubscribe send an email to xymon-leave@xymon.com
On 8/9/24 9:54 PM, Jeremy Laidman wrote:
The comma format is valid according to the doco. However I've never seen that usage before, so trying separate entries is a good idea.
Well separate entries didn't seem to work.
192.0.2.1 test-client-ilom # NAME:Test-Client-ILOM ssh
delayred=conn:10 delayred=ssh:10 delayyellow=conn:10 delayyellow=ssh:10
Regarding the "?":
"By prefixing a test with "?" errors will be reported with a "clear" status instead of red. This is known as a test for a "dialup" service, and allows you to run tests of hosts that are not always online, without getting alarms while they are off-line."
Thank you.
I suspect that other tests (e.g. ssh) will go clear if conn doesn't respond. But I'll have to test what happens if conn is good but ssh would otherwise go red.
-- Grant. . . . unix || die
participants (3)
-
Grant Taylor
-
Jeremy Laidman
-
Tom Schmidt