Problem with TIME qualifier to PORT test
Oh wise people of the Xymon clan
I occasionally have logrotate spawn a post-rotate process that hangs and needs to be manually killed. This is easily detected because the parent process also persists as "/bin/sh /etc/cron.daily/logrotate". I want to detect when logrotate has been running for more than an hour, and as it starts at 1am, I added the following line into my hosts' analysis.cfg file:
PROC "%^/bin/sh /etc/cron.daily/logrotate" 0 0
"TEXT=/etc/cron.daily/logrotate" TIME=*:0200:0100
My understanding is that this makes the rule apply only between 2am and 1am, or in other words, forces "green" between 1am and 2am.
I added this line yesterday.
Alas, 2 out of 6 hosts showed a red "procs" alert at around 10 seconds after 1am (when logrotate runs), for 5 minutes. So it failed for me.
Curiously, two servers that have an ongoing problem with a hung logrotate process (which inspired the check I'm trying to implement) showed green for the hour from 1am to 2am, then went back to red. So this indicates that the time qualifier is being handled correctly by those servers.
I can't figure out why it went red for the other servers that don't have a hung logrotate process. Any ideas?
Perhaps this is due to time lag between the client data delivery and the analysis. I wonder if I should change the timespec to "TIME=*:0200:0055"?
Cheers Jeremy
participants (1)
-
jlaidman@rebel-it.com.au