On Thu, 22 Mar 2012 10:41:09 -0000, "Neil Simmonds" <Neil.Simmonds at express-gifts.co.uk> wrote:
Message appears in log file for failure - from this we want an alert that will stay active and not expire after 30 minutes like log file alerts usually do.
We will hopefully then get a message in the log file that tells us of completion of the failed process, at this point we want to clear the alert.
It's not something that the Xymon client will do automatically, but you can script your way out of it. What I would do is to create a custom test for this - something like this:
#!/bin/sh
Logfile we monitor
FN="/var/log/mylogfile"
Message patterns that say "alert" or "OK"
ALERTMSG="Something bad" OKMSG="All OK"
Use the data from the "logfetch" status to grab the last 5 minutes of
log data
FPOS=cat $XYMONTMP/logfetch.${MACHINEDOTS}.status | grep "^${FN}:" | cut -d: -f2
LASTMSG=dd if=$FN bs=1 skip=$FPOS 2>/dev/null | egrep "$ALERTMSG|$OKMSG" | tail -n 1
LASTMSG now holds the last message which is either an alert or an OK
message
Actually the whole "cat ... grep ... cut ... dd .." thing is not needed,
since
you could just scan the entire logfile and pick out the last message
which is
either OK or alert... you could just do
LASTMSG=egrep "$ALERTMSG|$OKMSG" $FN | tail -n 1
Determine color
COLOR="green"
if test echo "$LASTMSG" | grep -c "$ALERTMSG" -ne 0
then
COLOR=red
fi
Send the status with a very long duration so it doesnt go purple.
$XYMON $XYMSRV "status+365d $MACHINE.mylog $COLOR date
Last message seen: $LASTMSG "
exit 0
This raises two interesting ideas:
- We should have status-messages that don't expire (go purple). Using a very long status lifetime is a kludge, really.
- The log analysis tool should know how to handle messages that cancel each other out.
Regards, Henrik