So you could also flag the nodes with the "dialup" flag which will allow the nodes to go down and Xymon wont complain that they are down, but you are still going to have to write your own server side script which determine if a host is down that shouldn't be down and then raise an alert for it. Xymon can't do that out of the box, and I'm not sure if any other monitoring systems can either, the majority of monitoring solutions expect nodes to be up 100% and only have exceptions for scheduled downtime etc.
Steve
On 14 August 2012 19:13, pankaj dorlikar <pankaj.dorlikar at gmail.com> wrote:
hi,
thanks for reply. We have clients installed on all the nodes. At any point of time, the nodes on which job is not running will be powered down. if new job comes, these nodes be powered up and some other nodes will go down which are not running any job.
On 8/14/12, Steven Carr <sjcarr at gmail.com> wrote:
How are you monitoring the nodes? do you have a xymon client on each of the nodes or are you doing a simple "ping" check to the node?
If you are just doing a simple ping check then, off the top of my head, I would make all nodes "noconn" in the hosts.cfg so Xymon doesn't actually ping them anymore and write a script which uses the data you have to ping nodes and then work out if the node should be up or not, and if the node is down and it shouldn't be then trigger a red alarm for that node.
Steve
On 14 August 2012 12:05, pankaj dorlikar <pankaj.dorlikar at gmail.com> wrote:
---------- Forwarded message ---------- From: pankaj dorlikar <pankaj.dorlikar at gmail.com> Date: Tue, 14 Aug 2012 16:34:00 +0530 Subject: Re: [Xymon] Green status To: Ryan Novosielski <novosirj at umdnj.edu>
Hi,
thank you for reply. But at any point of time, only some of the nodes will be down and all the other nodes will be up. If the server itself goes down, the monitoring of rest of the working nodes will be affected.
On 8/14/12, Ryan Novosielski <novosirj at umdnj.edu> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
What he is saying is that if there is an event that takes place where you can execute a script at the time it happens, you can disable the server by using the main binary's "disable" function. This binary used to be called "bb" but is now called "xymon" -- take a look at its man page to see how to send a disable message.
On 08/14/2012 03:55 AM, pankaj dorlikar wrote:
Hi,
Thank you for proving pointers and important clues. 1) Query regarding "server-side test" : We can know the status of the "down" nodes which are down as per schedular's instructions. But how this information will help in setting the blue/green color for those nodes in xymon web page? I mean how to send this data to xymon server? Also will it cover all the tests?
- How client cas send to send a "disable" command to server?
thank you
-pankaj
On 8/14/12, cleaver at terabithia.org <cleaver at terabithia.org> wrote:
> We are using xymon-4.2.2 on rhel 5.2 server and more than 200 > clients (HPC Cluster nodes). > > Our requirement is : > > -> If the node is powered down by scheduler for saving the > power, it is required that xymon should show its state as green > and same for other tests of same node. > > Nodes powered down by scheduler are identified by pbsnodes > command which will show state as power. > > -> If the node is going down by some other reason other that > powering down by scheduler, it should show red like normal > clients. >
Assuming your scheduler can have shell script hooks attached to events, I'd add something to send a "disable" command before it brings a node down, and then re-enable as it comes back up. If the nodes are being powered down without state being saved (eg, not suspending/resuming themselves), then just disable "until OK", otherwise I'd use some arbitrary future value.
Relevant tests will be blue (not green, as requested), but that will be handled as a non-event for SLA purposes.
Separately, it might be a good idea to have a separate server-side test that sends node state about each node to xymon independent of the node itself. That test is a fine place to put logic as well.
HTH,
-jc
- ---- _ _ _ _ ___ _ _ _ |Y#| | | |\/| | \ |\ | | |Ryan Novosielski - Sr. Systems Programmer |$&| |__| | | |__/ | \| _| |novosirj at umdnj.edu - 973/972.0922(2-0922) \__/ Univ. of Med. and Dent.|IST/EI-Academic Svcs. - ADMC 450, Newark -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAlAqFsEACgkQmb+gadEcsb53xACfVP9x3ThR0zKtrYFVfVhHzJoI JNQAoLUaRTt3AcQmrhoArknmclS7WkPw =jBNe -----END PGP SIGNATURE-----
-- Pankaj V. Dorlikar
-- Pankaj V. Dorlikar
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
-- Pankaj V. Dorlikar