We have a requirement that would allow us to move some more of our monitoring into Xymon if we could find a way of doing it.
Basically the issue is this,
Message appears in log file for failure - from this we want an alert that will stay active and not expire after 30 minutes like log file alerts usually do.
We will hopefully then get a message in the log file that tells us of completion of the failed process, at this point we want to clear the alert.
Is anyone doing anything like this or have any idea how we might go about it?
Regards,
Neil Simmonds
Name & Registered Office: EXPRESS GIFTS LIMITED, 2 GREGORY ST, HYDE, CHESHIRE, ENGLAND, SK14 4TH, Company No. 00718151. Express Gifts Limited is authorised and regulated by the Financial Services Authority
NOTE: This email and any information contained within or attached in a separate file is confidential and intended solely for the
Individual to whom it is addressed. The information or data included is solely for the purpose indicated or previously agreed. Any
information or data included with this e-mail remains the property of Findel PLC and the recipient will refrain from utilising the
information for any purpose other than that indicated and upon request will destroy the information and remove it from their records.
Any views or opinions presented are solely those of the author and do not necessarily represent those of Findel PLC. If you are not
the intended recipient, be advised that you have received this email in error and that any use, dissemination, forwarding, printing,
or copying of this email is strictly prohibited. No warranties or assurances are made in relation to the safety and content of this
e-mail and any attachments. No liability is accepted for any consequences arising from it. Findel Plc reserves the right to monitor
all e-mail communications through its internal and external networks. If you have received this email in error please notify our IT
helpdesk on +44(0) 1254 303030
On Thu, 22 Mar 2012 10:41:09 -0000, "Neil Simmonds" <Neil.Simmonds at express-gifts.co.uk> wrote:
Message appears in log file for failure - from this we want an alert that will stay active and not expire after 30 minutes like log file alerts usually do.
We will hopefully then get a message in the log file that tells us of completion of the failed process, at this point we want to clear the alert.
It's not something that the Xymon client will do automatically, but you can script your way out of it. What I would do is to create a custom test for this - something like this:
#!/bin/sh
Logfile we monitor
FN="/var/log/mylogfile"
Message patterns that say "alert" or "OK"
ALERTMSG="Something bad" OKMSG="All OK"
Use the data from the "logfetch" status to grab the last 5 minutes of
log data
FPOS=cat $XYMONTMP/logfetch.${MACHINEDOTS}.status | grep "^${FN}:" | cut -d: -f2
LASTMSG=dd if=$FN bs=1 skip=$FPOS 2>/dev/null | egrep "$ALERTMSG|$OKMSG" | tail -n 1
LASTMSG now holds the last message which is either an alert or an OK
message
Actually the whole "cat ... grep ... cut ... dd .." thing is not needed,
since
you could just scan the entire logfile and pick out the last message
which is
either OK or alert... you could just do
LASTMSG=egrep "$ALERTMSG|$OKMSG" $FN | tail -n 1
Determine color
COLOR="green"
if test echo "$LASTMSG" | grep -c "$ALERTMSG" -ne 0
then
COLOR=red
fi
Send the status with a very long duration so it doesnt go purple.
$XYMON $XYMSRV "status+365d $MACHINE.mylog $COLOR date
Last message seen: $LASTMSG "
exit 0
This raises two interesting ideas:
- We should have status-messages that don't expire (go purple). Using a very long status lifetime is a kludge, really.
- The log analysis tool should know how to handle messages that cancel each other out.
Regards, Henrik
On Thu, 2012-03-22 at 12:11 +0100, henrik at hswn.dk wrote:
This raises two interesting ideas:
- We should have status-messages that don't expire (go purple). Using a very long status lifetime is a kludge, really.
Hello,
We have a similar requirement to the OPs. What would be nice is for the 'xymon' command to have a 'remove' command which would remove a (permanent, non-purple) status for a given host/service. Along with the remove command would be an 'ID' which must be the same as that sent with the original status message. That way only known ID statuses could be removed, any unknown ID caused the remove to be ignored.
I won't go into details, but we have been running BB for several years using an in-house modified 'TheState' BB addon. This allows us to send permanent (non-purple) status messages when an event occurs, and then can remove them when required (via a web frontend which simply calls the 'bb' command to talk to the server).
What the OP has asked for we already do here, it is on my TODO list to see how we can get Xymon to do it too! :-)
John.
-- John Horne Tel: +44 (0)1752 587287 Plymouth University, UK Fax: +44 (0)1752 587001
I do something similar to what your asking to check one of our DHCP servers to see about "low dhcp pools" and update Xymon.
I wanted to: Run swatch on the dhcp server and watch for messages in the log that indicate "low dhcp pool", then execute a script that updates xymon. This would be ideal and real time, but...
Because of issues, I did: Periodically ssh to the dhcp server (from my xymon server), grab logs from the last "x" minutes, grep out dhcp pool info, then update xymon based on what I found.
Ken Connell Intermediate Network Engineer Computer & Communication Services Ryerson University 350 Victoria St RM AB50 Toronto, Ont M5B 2K3 416-979-5000 x6709
-----Original Message----- From: Neil Simmonds <Neil.Simmonds at express-gifts.co.uk> Sender: xymon-bounces at xymon.com Date: Thu, 22 Mar 2012 10:41:09 To: <xymon at xymon.com> Subject: [Xymon] Tricky one for log file monitoring
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
On Thu, 2012-03-22 at 11:54 +0000, kconnell at ryerson.ca wrote:
I do something similar to what your asking to check one of our DHCP servers to see about "low dhcp pools" and update Xymon.
I wanted to: Run swatch on the dhcp server and watch for messages in the log that indicate "low dhcp pool", then execute a script that updates xymon. This would be ideal and real time, but...
Within swatch could you not use the 'exec' command to invoke 'xymon' to update the Xymon server? Something like (completely untested!):
watchfor = /low dhcp pools/
exec = xymon <xymonserver IP> 'status <localhostname>.dhcp red
date DHCP pools getting low!'
(The 'exec' is all on one line.)
The man page for 'xymon' has more details.
John.
-- John Horne Tel: +44 (0)1752 587287 Plymouth University, UK Fax: +44 (0)1752 587001
My issue was getting swatch working on this particular sun os box....which I'm too crazy about.
That's exactly what I had in mind though :)
Ken Connell Intermediate Network Engineer Computer & Communication Services Ryerson University 350 Victoria St RM AB50 Toronto, Ont M5B 2K3 416-979-5000 x6709
----- Original Message ----- From: John Horne <john.horne at plymouth.ac.uk> Date: Thursday, March 22, 2012 9:04 am Subject: Re: [Xymon] Tricky one for log file monitoring To: xymon at xymon.com
On Thu, 2012-03-22 at 11:54 +0000, kconnell at ryerson.ca wrote:
I do something similar to what your asking to check one of our DHCP servers to see about "low dhcp pools" and update Xymon.
I wanted to: Run swatch on the dhcp server and watch for messages in the log that indicate "low dhcp pool", then execute a script that updates xymon. This would be ideal and real time, but...
Within swatch could you not use the 'exec' command to invoke 'xymon' to update the Xymon server? Something like (completely untested!):
watchfor = /low dhcp pools/ exec = xymon <xymonserver IP> 'status <localhostname>.dhcp red
dateDHCP pools getting low!'(The 'exec' is all on one line.)
The man page for 'xymon' has more details.
John.
-- John Horne Tel: +44 (0)1752 587287 Plymouth University, UK Fax: +44 (0)1752 587001
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
On Thu, Mar 22, 2012 at 11:45 PM, John Horne <john.horne at plymouth.ac.uk> wrote:
Within swatch could you not use the 'exec' command to invoke 'xymon' to update the Xymon server? Something like (completely untested!):
watchfor = /low dhcp pools/ exec = xymon <xymonserver IP> 'status <localhostname>.dhcp red
dateDHCP pools getting low!'
Won't the test go purple after a while? A way to get around this would be to create a status file, and then have another process "refresh" Xymon based on the content of the file:
exec = echo 'status <localhostname>.dhcp red date DHCP pools
getting low!' > /var/tmp/dhcp.status
Then in tasks.cfg:
[dhcp] ENVFILE /usr/lib/xymon/server/etc/xymonserver.cfg CMD xymon <xymonserverIP> `cat /var/tmp/dhcp.status INTERVAL 5m
J
On Mon, 2012-03-26 at 13:40 +1100, Jeremy Laidman wrote:
On Thu, Mar 22, 2012 at 11:45 PM, John Horne <john.horne at plymouth.ac.uk> wrote:
Within swatch could you not use the 'exec' command to invoke 'xymon' to update the Xymon server? Something like (completely untested!):
watchfor = /low dhcp pools/ exec = xymon <xymonserver IP> 'status <localhostname>.dhcp red
dateDHCP pools getting low!'Won't the test go purple after a while?
As it stands, yes, so maybe use something like 'status+5d'.
A way to get around this would be to create a status file, and then have another process "refresh" Xymon based on the content of the file:
To achieve this with our old BB system with TheState, we had an 'expire' time included in the status message. A separate process then ran every 5 mins or so and looked for 'expired' statuses and deleted them. This then allowed a 'default' status to be shown (usually green). (Sorry probably didn't describe that too well, it is a tad complex but has worked well.)
John.
-- John Horne Tel: +44 (0)1752 587287 Plymouth University, UK Fax: +44 (0)1752 587001
participants (5)
-
henrik@hswn.dk
-
jlaidman@rebel-it.com.au
-
john.horne@plymouth.ac.uk
-
kconnell@ryerson.ca
-
Neil.Simmonds@express-gifts.co.uk