Interesting question
I'm about to undertake writing a series of monitors for a custom app we have here in-house.
This app lives over a networked filesystem (think cvfs or gpfs) that is managed by two master nodes, a master and a failover. As a result, only one node at a time can answer the query I want to give it.
My conundrum:
If I make the query form my Xymon server on node1 and it fails over to node2, node1 becomes completely unable to answer the question, and that check would go "red". Same for node2 back to node1.
If I run the check locally on each system, while one is working, the other will go "blue".
Have any of you ever written monitors for servers that carry a service in an active/passive configuration, and been able to keep the individual servers from going into some strange state as a result of failovers and such?
How did you handle it?
Jerald M. Sheets jr.
depends on if the app is UNIX or Windows based.
If UNIX based, I use CARP and ssh-tunnel
I haven't set up anything for Windows in this setting yet.
Jim Sloan
Just remember, today is the day you thought tomorrow was going to be yesterday.
From: Jerald Sheets <questy at gmail.com> To: hobbit at hswn.dk Sent: Wed, December 30, 2009 11:34:12 AM Subject: [hobbit] Interesting question
I'm about to undertake writing a series of monitors for a custom app we have here in-house.
This app lives over a networked filesystem (think cvfs or gpfs) that is managed by two master nodes, a master and a failover. As a result, only one node at a time can answer the query I want to give it.
My conundrum:
If I make the query form my Xymon server on node1 and it fails over to node2, node1 becomes completely unable to answer the question, and that check would go "red". Same for node2 back to node1.
If I run the check locally on each system, while one is working, the other will go "blue".
Have any of you ever written monitors for servers that carry a service in an active/passive configuration, and been able to keep the individual servers from going into some strange state as a result of failovers and such?
How did you handle it?
Jerald M. Sheets jr.
I did something recently that checks the status of the conn column for a host before running a custom script.
I accomplished it using hobbitdboard
BBJUNK=bb localhost "hobbitdboard host=${BBHOST} test=ssh fields=color"
if [[ "$BBJUNK" =~ "green" ]]; then
debugmsg Launching - $BBIP $BBHOST $SSHPRIV $BBJUNK;
nohup /usr/lib/hobbit/server/ext/pd2a.sh $BBIP $BBHOST $SSHPRIV &
elif [[ "$BBJUNK" == "" ]]; then
debugmsg Launching - $BBIP $BBHOST $SSHPRIV $BBJUNK;
nohup /usr/lib/hobbit/server/ext/pd2a.sh $BBIP $BBHOST $SSHPRIV &
else
outputmsg ssh down - $BBJUNK
fi
Maybe that's something you can adapt for your test?
On Wed, Dec 30, 2009 at 11:34 AM, Jerald Sheets <questy at gmail.com> wrote:
I'm about to undertake writing a series of monitors for a custom app we have here in-house.
This app lives over a networked filesystem (think cvfs or gpfs) that is managed by two master nodes, a master and a failover. As a result, only one node at a time can answer the query I want to give it.
My conundrum:
If I make the query form my Xymon server on node1 and it fails over to node2, node1 becomes completely unable to answer the question, and that check would go "red". Same for node2 back to node1.
If I run the check locally on each system, while one is working, the other will go "blue".
Have any of you ever written monitors for servers that carry a service in an active/passive configuration, and been able to keep the individual servers from going into some strange state as a result of failovers and such?
How did you handle it?
Jerald M. Sheets jr.
I think I figured out a way to make this happen.
The command I need to run won't even return data on a machine that is not the primary server. So I will do two things. IF the server is not primary AND the command doesn't run, send a "clear" status back to the server. IF they do work, go to the main test routine and send whatever color reflects the status.
Running this on both hosts will keep things from going purple, and will get me the stats I need.
Once I get it functional, I may need to chat with you guys regarding setting up an NCV. I have tried the examples several times with no joy. But, enough of that for now. I have a new-year's celebration to get ready for.
Happy New Years, Hobbits!
Jerald M. Sheets jr.
On Wed, Dec 30, 2009 at 11:56 AM, Patrick Nixon <pnixon at gmail.com> wrote:
I did something recently that checks the status of the conn column for a host before running a custom script.
I accomplished it using hobbitdboard
BBJUNK=
bb localhost "hobbitdboard host=${BBHOST} test=ssh fields=color"if [[ "$BBJUNK" =~ "green" ]]; then debugmsg Launching - $BBIP $BBHOST $SSHPRIV $BBJUNK; nohup /usr/lib/hobbit/server/ext/pd2a.sh $BBIP $BBHOST $SSHPRIV & elif [[ "$BBJUNK" == "" ]]; then debugmsg Launching - $BBIP $BBHOST $SSHPRIV $BBJUNK; nohup /usr/lib/hobbit/server/ext/pd2a.sh $BBIP $BBHOST $SSHPRIV & else outputmsg ssh down - $BBJUNK fiMaybe that's something you can adapt for your test?
On Wed, Dec 30, 2009 at 11:34 AM, Jerald Sheets <questy at gmail.com> wrote:
I'm about to undertake writing a series of monitors for a custom app we have here in-house.
This app lives over a networked filesystem (think cvfs or gpfs) that is managed by two master nodes, a master and a failover. As a result, only one node at a time can answer the query I want to give it.
My conundrum:
If I make the query form my Xymon server on node1 and it fails over to node2, node1 becomes completely unable to answer the question, and that check would go "red". Same for node2 back to node1.
If I run the check locally on each system, while one is working, the other will go "blue".
Have any of you ever written monitors for servers that carry a service in an active/passive configuration, and been able to keep the individual servers from going into some strange state as a result of failovers and such?
How did you handle it?
Jerald M. Sheets jr.
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
participants (3)
-
odinn_asgaard@yahoo.com
-
pnixon@gmail.com
-
questy@gmail.com