Really our situation is not so dramatic. per 5 min. tests will affect performance, but will not block anything. We can allow it for short time, but we have to avoid unnecessary load during all the day. In most cases fix delivered quite fast and it would be nice to have more accurate resolution time statistic.
Best regards,
Andrey Chervonets
SIA CoMinder http://www.cominder.eu/
From: Jeremy Laidman <jlaidman at rebel-it.com.au> To: Andrey Chervonets <A.Chervonets at cominder.eu>, Cc: "xymon at xymon.com" <xymon at xymon.com> Date: 03.05.2013 04:36 Subject: Re: [Xymon] Custom check interval for different status of custom tests
On 2 May 2013 17:07, Andrey Chervonets <A.Chervonets at cominder.eu> wrote: For some tests we do not to check once per 5 min just because of performance impact of some complex requests.
If you re-run the checks more frequently when they fail, won't there be a performance impact? If the failure is due to load, then you might end up making things worse. Even if load isn't impacted, the people who are troubleshooting the problem might think your monitoring is the /cause/ of the problem, rather than a symptom.
I thought about trying to solve this in a generic way - having a script that looks for failures and does a re-test, perhaps for tests that are tagged for re-testing in hosts.cfg. However, I realised that very few of my tests would benefit from this and not be at risk of causing increased load during a time of trouble. Of those, I really would need to handle each one on a case-by-case basis, to determine an optimal balance of detecting resolution quickly vs limiting load caused by the tests. As it's a case-by-case assessment, I thought a generic solution wouldn't be appropriate.
J