Matthew,
Hey Guys, is there any way using XYMON to set up specific alerts based on windows message ID’s and hardware logs, I have been tasked with setting up monitoring the windows server hardware logs, and only alert on those, and have the rest of the messages show up as green. My boss is only interested in the hardware failures at the moment, we have nagios already set up, but I used hobbit in the past and was spoiled, so I want to set it up here, and hopefully replace nagios with xymon. To do that I need to find a way to alert on a few small things to give them a taste of xymon before I can get them to bite. I need to be able to set up an alert for a cluster failover, be able to alert on a hardware failure, and be able to alert on a cert expiration date. If you guys can help me out with some or all of those things, we’ll be able to get nagios out and get xymon in.
Windows event logs are a minefield. Sure they support a severity code (error/warning/info) - but a lot of messages are misclassified or may be safely ignored. In my experience you want to be able to tune these out.
I wrote an extension winevtmsgs.pl which is at: http://xymonton.org/monitors:winevtmsgs.pl
I don't know if anyone else has installed it, because it assumes you are forwarding all your logs with syslog to a central log host where you run the test. It works really well for me, and I have extensively customised the config file to tune up or down any number of events. For example, often services may not start up within timeout period (especially at boot time when required services may not yet be running) - I can tolerate a one-off alert at boot time, but there are other services (e.g. Windows Defender) that constantly generate these messages, so I tune them down.
Hardware events will depend a lot on how well integrated the server hardware is with event logs and/or require separate tests to be run. There will almost certainly be some vendor agents/utilities required. Some of these do log failure events to event logs, but I prefer a regular polling approach because if you miss the event message you will never alert. Testing can be on the server or if remote monitoring is supported (SNMP, etc) from a central server.
On HP servers, I use SNMP and the server agents to monitor power, temperature, RAID, etc (so it doesn't matter if the server is running Windows, Linux, etc) with devmon but there are other approaches.
David.
-- David Baldwin - Senior Systems Administrator (Datacentres + Networks) Information and Communication Technology Services Australian Sports Commission http://ausport.gov.au Tel 02 62147830 Fax 02 62141830 PO Box 176 Belconnen ACT 2616 david.baldwin at ausport.gov.au Leverrier Street Bruce ACT 2617
Keep up to date with what's happening in Australian sport visit http://www.ausport.gov.au