Hi
On 23 January 2015 at 06:43, usa ims via Xymon <xymon at xymon.com> wrote:
I have a custom client test that I only want it to be executed once every
day at 2:15am. After looking through the man pages of hosts.cfg and alerts.cfg, I am confused on what is the best course of action. From reading the archives, it was indicated to use the cron utility of Linux for it to start at ‘2:15am’.
The tasks.cfg allows specifying a "CRONDATE" so you don't need to use Linux cron, and it's probably better to keep it all within Xymon.
I would like to be alerted only once but the test to run every hour until recovered.
So normally you want the test to run once only at 2:15am. But if it's in a failed state, you want the test re-run every hour.
This is also what happens with the network tests, where the xymonnet-again.sh script looks for failed network tests and re-runs them every minute rather than the normal 5-minute interval. You can't make use of the xymonnet-again.sh script because it only works with the standard network tests (it simply runs xymonnet specifying the tests to repeat). But you could use the same idea.
In the clientlaunch.cfg of the xymon/hobbit client, I’m going to put 15m, am I correct?
Why 15 minutes? Did you mean 60 minutes?
[xxxxx] ENVFILE $HOBBITCLIENTHOME/etc/hobbitclient.cfg CMD $HOBBITCLIENTHOME/ext/xxxxx.pl LOGFILE $HOBBITCLIENTHOME/logs/xxxxx.log INTERVAL 60m
Here you could put (instead of INTERVAL) something like: CRONDATE 15 2 * * * to run at 2:15 every morning.
And in the script I’m going to put ‘status+24h’ so that it will not turn
purple.
Yep.
In the alerts.cfg, I have the following: HOST=xxxxx SCRIPT /etc/xymon/xxxxx-emails-geoxx.sh SERVICE=xxxxx COLOR=yellow REPEAT=24h FORMAT=SCRIPT RECOVERED
Yep, so this will run the script (I'm guessing) to send an email when the service fails and when it is restored.
What you could do is add another line to alerts.cfg to re-run your original monitoring script for the test, but only when the test is failing. Something like this (notice that I put the SERVICE on the first line so that it applies to both SCRIPT lines):
HOST=xxxxx SERVICE=xxxxx SCRIPT /etc/xymon/xxxxx-emails-geoxx.sh SERVICE=xxxxx COLOR=yellow REPEAT=24h FORMAT=SCRIPT RECOVERED SCRIPT $HOBBITCLIENTHOME/ext/xxxxx.pl "&host&" FORMAT=SCRIPT DURATION>5m DURATION<16h REPEAT=1h
So the second line will mean that an error condition will cause a re-test, and when the error condition stops, the re-test will also stop. Some important things to note here are: a) If the script that does the checks (xxxxx.pl) operates on several hosts, then it should take as its parameter, the name of a host to limit its checks on. Otherwise when one host fails, all of them would be re-tested. b) The "DURATION" specification means that the re-test won't happen immediately on failure, but will wait for the REPEAT interval of 1h. There's no point re-testing within 3 seconds of the first test failing. c) I've set the maximum DURATION to 16 hours because if you haven't fixed the problem by 6pm it probably won't get done until the next day. Adjust this as you see fit, but probably not worth having it more than 24h.
J
participants (1)
-
jlaidman@rebel-it.com.au