big brother replacement
Hello list,
It's that time of year again - we're looking for alternatives to our aging bb infrastructure - although it's been helped by the bbgen extensions, it is showing it's age, and is getting harder to support as time goes by.
Of all the potential replacements we've looked at, I don't really like any of them - the commercial bb stuff is uninspiring, and their linux support is lacking. The other solutions tend to be heavyweight j2ee and database apps, or oddities like nagios. What I'd really love to find is something like an up-to-date version of big brother+bbgen, something like hobbit.
Unfortunately, last I checked, hobbit still lacked a crucial capability that we depend on, the built-in bb failover mechanism. We have 2 data centers, several hundred miles apart, with bb servers in several lans at both sites. Each bb server has a twin at the other location, and they both monitor the servers in both data centers, but only one of the bb servers does reporting, as determined by the failover state. The bb failover has worked marvelously, and has kept bb firmly in place so far, despite the other advantages of hobbit.
So, the $64 question: Is there anything in hobbit, or on the horizon, which will allow hobbit to serve as a drop-in replacement for bb, including the failover capability?
Thanks for your words of wisdom.
Joe
Let me see if I understand. You have several bb servers at one datacenter, each with their twin at the other datacenter, and both sets do the tests. They report to one central display server, but only one set reports at a time, depending on failover state, correct?
Is this failover automatic? If so, how is this failover determined? What if this failover has a false positive? If not, what is your timeframe to swap over?
Tod Hansmann Network Engineer
-----Original Message----- From: Sloan [mailto:joe at tmsusa.com] Sent: Thursday, November 01, 2007 4:20 PM To: hobbit at hswn.dk Subject: [hobbit] big brother replacement
Hello list,
It's that time of year again - we're looking for alternatives to our aging bb infrastructure - although it's been helped by the bbgen extensions, it is showing it's age, and is getting harder to support as time goes by.
Of all the potential replacements we've looked at, I don't really like any of them - the commercial bb stuff is uninspiring, and their linux support is lacking. The other solutions tend to be heavyweight j2ee and database apps, or oddities like nagios. What I'd really love to find is something like an up-to-date version of big brother+bbgen, something like hobbit.
Unfortunately, last I checked, hobbit still lacked a crucial capability that we depend on, the built-in bb failover mechanism. We have 2 data centers, several hundred miles apart, with bb servers in several lans at both sites. Each bb server has a twin at the other location, and they both monitor the servers in both data centers, but only one of the bb servers does reporting, as determined by the failover state. The bb failover has worked marvelously, and has kept bb firmly in place so far, despite the other advantages of hobbit.
So, the $64 question: Is there anything in hobbit, or on the horizon, which will allow hobbit to serve as a drop-in replacement for bb, including the failover capability?
Thanks for your words of wisdom.
Joe
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Tod Hansmann wrote:
Let me see if I understand. You have several bb servers at one datacenter, each with their twin at the other datacenter, and both sets do the tests. They report to one central display server, but only one set reports at a time, depending on failover state, correct?
You have the basic idea, but there is no single central server, just pairs of bb servers, one to a data center, in each lan which is being monitored. For each pair of bb servers, only the server at data center A does reporting, unless the server in data center B cannot reach the server in data center A, in which case the server in data center B will take over the reporting duties until the bb server in data center A becomes reachable again. While this could theoretically lead to a split brain condition, the failover condition has only ever triggered when there was a wan outage.
Is this failover automatic? If so, how is this failover determined? What if this failover has a false positive? If not, what is your timeframe to swap over?
IIRC It takes one bb cycle to kick in.
We've not seen a false positive, as I mentioned above.
It's just the standard built-in bb failover -
head ~bb/ext/failover follows:
#!/bin/sh
failover
BIG BROTHER - FAILOVER SCRIPT
Sean MacGuire
(c) Copyright Quest Software, Inc. 1997-2003 All rights reserved.
failover WATCHES BBNET and BBPAGER
IF BBNET OR BBPAGER BECOMES UNAVAILABLE, THEN TAKE OVER UNTIL THEY RETURN
To use, just add failover to the BBEXT variable in etc/bbdef.sh
To configure BBPAGER failover:
define both the primary and failover machines as BBPAGERS in etc/bb-hosts
and set bbwarn: FAILOVER in etc/bbwarnsetup.cfg
Joe
I see what you're saying, but you still have to manually specify which server you're connecting to. If the bb1.domain.tld can not be reached the techs have to manually enter bb2.domain.tld - correct?
I know of several BB ext scripts that work perfectly with Hobbit and even more then just needed a small weak. Would you be able to post the entire ext script? Hopefully Henrik is willing to answer your $64 question =)
Josh
On 11/1/07, Sloan <joe at tmsusa.com> wrote:
Tod Hansmann wrote:
Let me see if I understand. You have several bb servers at one datacenter, each with their twin at the other datacenter, and both sets do the tests. They report to one central display server, but only one set reports at a time, depending on failover state, correct?
You have the basic idea, but there is no single central server, just pairs of bb servers, one to a data center, in each lan which is being monitored. For each pair of bb servers, only the server at data center A does reporting, unless the server in data center B cannot reach the server in data center A, in which case the server in data center B will take over the reporting duties until the bb server in data center A becomes reachable again. While this could theoretically lead to a split brain condition, the failover condition has only ever triggered when there was a wan outage.
Is this failover automatic? If so, how is this failover determined? What if this failover has a false positive? If not, what is your timeframe to swap over?
IIRC It takes one bb cycle to kick in.
We've not seen a false positive, as I mentioned above.
It's just the standard built-in bb failover -
head ~bb/ext/failover follows:
#!/bin/sh
failover
BIG BROTHER - FAILOVER SCRIPT
Sean MacGuire
(c) Copyright Quest Software, Inc. 1997-2003 All rights reserved.
failover WATCHES BBNET and BBPAGER
IF BBNET OR BBPAGER BECOMES UNAVAILABLE, THEN TAKE OVER UNTIL THEY
RETURN
To use, just add failover to the BBEXT variable in etc/bbdef.sh
To configure BBPAGER failover:
define both the primary and failover machines as BBPAGERS in
etc/bb-hosts
and set bbwarn: FAILOVER in etc/bbwarnsetup.cfg
Joe
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
I think he's just looking for the alerts. From what he's indicating, it doesn't look like he's too concerned about the display (unless he has a bunch of web pages up at once all the time).
Tod Hansmann
Network Engineer
<http://www.directpointe.com/>
From: Josh Luthman [mailto:josh at imaginenetworksllc.com] Sent: Thursday, November 01, 2007 5:12 PM To: hobbit at hswn.dk Subject: Re: [hobbit] big brother replacement
I see what you're saying, but you still have to manually specify which server you're connecting to. If the bb1.domain.tld can not be reached the techs have to manually enter bb2.domain.tld - correct?
I know of several BB ext scripts that work perfectly with Hobbit and even more then just needed a small weak. Would you be able to post the entire ext script? Hopefully Henrik is willing to answer your $64 question =)
Josh
On 11/1/07, Sloan <joe at tmsusa.com> wrote:
Tod Hansmann wrote:
Let me see if I understand. You have several bb servers at one datacenter, each with their twin at the other datacenter, and both sets do the tests. They report to one central display server, but only one
set reports at a time, depending on failover state, correct?
You have the basic idea, but there is no single central server, just pairs of bb servers, one to a data center, in each lan which is being monitored. For each pair of bb servers, only the server at data center A does reporting, unless the server in data center B cannot reach the server in data center A, in which case the server in data center B will take over the reporting duties until the bb server in data center A becomes reachable again. While this could theoretically lead to a split brain condition, the failover condition has only ever triggered when there was a wan outage.
Is this failover automatic? If so, how is this failover determined? What if this failover has a false positive? If not, what is your timeframe to swap over?
IIRC It takes one bb cycle to kick in.
We've not seen a false positive, as I mentioned above.
It's just the standard built-in bb failover -
head ~bb/ext/failover follows:
#!/bin/sh
failover
BIG BROTHER - FAILOVER SCRIPT
Sean MacGuire
(c) Copyright Quest Software, Inc. 1997-2003 All rights reserved.
failover WATCHES BBNET and BBPAGER
IF BBNET OR BBPAGER BECOMES UNAVAILABLE, THEN TAKE OVER UNTIL THEY
RETURN
To use, just add failover to the BBEXT variable in etc/bbdef.sh
To configure BBPAGER failover:
define both the primary and failover machines as BBPAGERS in
etc/bb-hosts
and set bbwarn: FAILOVER in etc/bbwarnsetup.cfg
Joe
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
Josh Luthman wrote:
I see what you're saying, but you still have to manually specify which server you're connecting to. If the bb1.domain.tld can not be reached the techs have to manually enter bb2.domain.tld - correct?
Well, no need to enter anything manually. We have on each big brother server a page with links to all the other bb servers, and I'm sure the support folks have links bookmarked, so if the DC1 data center is down, then instead of clicking on e.g. dc1bbdata, they'd click on dc2bbdata. Normally the 2 bb display servers in each pair provide an identical view, so this only matters in the event of an outage.
I know of several BB ext scripts that work perfectly with Hobbit and even more then just needed a small weak. Would you be able to post the entire ext script? Hopefully Henrik is willing to answer your $64 question =)
Sure, I can post the entire script - it's in the attachment -
Joe
I see now - you've a redundant BBNET. I haven't used BB in several weeks and I never got really complex with it - a lot of ping tests was what I needed out of it. Can you explain what BBNET is?
On 11/1/07, Sloan <joe at tmsusa.com> wrote:
Josh Luthman wrote:
I see what you're saying, but you still have to manually specify which server you're connecting to. If the bb1.domain.tld can not be reached the techs have to manually enter bb2.domain.tld - correct?
Well, no need to enter anything manually. We have on each big brother server a page with links to all the other bb servers, and I'm sure the support folks have links bookmarked, so if the DC1 data center is down, then instead of clicking on e.g. dc1bbdata, they'd click on dc2bbdata. Normally the 2 bb display servers in each pair provide an identical view, so this only matters in the event of an outage.
I know of several BB ext scripts that work perfectly with Hobbit and even more then just needed a small weak. Would you be able to post the entire ext script? Hopefully Henrik is willing to answer your $64 question =)
Sure, I can post the entire script - it's in the attachment -
Joe
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
Well, bb being somewhat modular, there are 3 components: BBNET, BBPAGER, BBDISPLAY
BBNET is the component of bb that tests the network services/connectivity on the remote hosts.
Joe
Josh Luthman wrote:
I see now - you've a redundant BBNET. I haven't used BB in several weeks and I never got really complex with it - a lot of ping tests was what I needed out of it. Can you explain what BBNET is?
On 11/1/07, *Sloan* <joe at tmsusa.com <mailto:joe at tmsusa.com>> wrote:
Josh Luthman wrote: > I see what you're saying, but you still have to manually specify > which server you're connecting to. If the bb1.domain.tld can not be > reached the techs have to manually enter bb2.domain.tld - correct? Well, no need to enter anything manually. We have on each big brother server a page with links to all the other bb servers, and I'm sure the support folks have links bookmarked, so if the DC1 data center is down, then instead of clicking on e.g. dc1bbdata, they'd click on dc2bbdata. Normally the 2 bb display servers in each pair provide an identical view, so this only matters in the event of an outage. > > I know of several BB ext scripts that work perfectly with Hobbit and > even more then just needed a small weak. Would you be able to post > the entire ext script? Hopefully Henrik is willing to answer your > $64 question =) Sure, I can post the entire script - it's in the attachment - Joe To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk <mailto:hobbit-unsubscribe at hswn.dk>-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
That would be relative to Hobbit's bbtest I believe - someone correct me if I'm wrong, I'm just guessing here!
I don't see any reason to think that script wouldn't work with some variable
changes like BBNET=$GREP BBNET $BBHOSTS | $GREP "^[0-9]" | $GREP -v "^\#"
but I am not an expert by any means!
Getting back to the version 3.3 - after 1.9btf Quest starting selling the product. I don't know the exact history behind it but 1.9btf is what you get without paying for anything. I have worked with and continue to monitor a network with 3.1 or 3.2 (they decided to revert from 3.3 as it looks quite a bit different on the BBDISPLAY) and I honestly don't see what they've changed between 1.9 and 3.2. Most of the features of BB I heard of or read were not only already in Hobbit but were even better then what I had heard. Not to mention the dozens of BB scripts that can be relatively painless to migrate.
Josh
On 11/1/07, Sloan <joe at tmsusa.com> wrote:
Well, bb being somewhat modular, there are 3 components: BBNET, BBPAGER, BBDISPLAY
BBNET is the component of bb that tests the network services/connectivity on the remote hosts.
Joe
Josh Luthman wrote:
I see now - you've a redundant BBNET. I haven't used BB in several weeks and I never got really complex with it - a lot of ping tests was what I needed out of it. Can you explain what BBNET is?
On 11/1/07, *Sloan* <joe at tmsusa.com <mailto:joe at tmsusa.com>> wrote:
Josh Luthman wrote: > I see what you're saying, but you still have to manually specify > which server you're connecting to. If the bb1.domain.tld can notbe > reached the techs have to manually enter bb2.domain.tld - correct?
Well, no need to enter anything manually. We have on each bigbrother server a page with links to all the other bb servers, and I'm sure the support folks have links bookmarked, so if the DC1 data center is down, then instead of clicking on e.g. dc1bbdata, they'd click on dc2bbdata. Normally the 2 bb display servers in each pair provide an identical view, so this only matters in the event of an outage.
> > I know of several BB ext scripts that work perfectly with Hobbitand > even more then just needed a small weak. Would you be able to post > the entire ext script? Hopefully Henrik is willing to answer your > $64 question =)
Sure, I can post the entire script - it's in the attachment - Joe To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk <mailto:hobbit-unsubscribe at hswn.dk>-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
Josh Luthman wrote:
That would be relative to Hobbit's bbtest I believe - someone correct me if I'm wrong, I'm just guessing here!
I don't see any reason to think that script wouldn't work with some variable changes like BBNET=
$GREP BBNET $BBHOSTS | $GREP "^[0-9]" | $GREP -v "^\#"but I am not an expert by any means!
Well, depending on what I hear on this list in the next day or so, taking a crack at adapting the old bb failover script may be my best option.
Getting back to the version 3.3 - after 1.9btf Quest starting selling the product. I don't know the exact history behind it but 1.9btf is what you get without paying for anything. I have worked with and continue to monitor a network with 3.1 or 3.2 (they decided to revert from 3.3 as it looks quite a bit different on the BBDISPLAY) and I honestly don't see what they've changed between 1.9 and 3.2. Most of the features of BB I heard of or read were not only already in Hobbit but were even better then what I had heard. Not to mention the dozens of BB scripts that can be relatively painless to migrate.
Ah, interesting - I always had the feeling that quest didn't do much of anything with the code except put in some verbiage and legal warnings, and tried to push their own proprietary and non linux-friendly stuff and left the bb code base to slowly decay.
Joe
You think their lack of linux support is bad? Get a quote.
On 11/1/07, Sloan <joe at tmsusa.com> wrote:
Josh Luthman wrote:
That would be relative to Hobbit's bbtest I believe - someone correct me if I'm wrong, I'm just guessing here!
I don't see any reason to think that script wouldn't work with some variable changes like BBNET=
$GREP BBNET $BBHOSTS | $GREP "^[0-9]" | $GREP -v "^\#"but I am not an expert by any means!Well, depending on what I hear on this list in the next day or so, taking a crack at adapting the old bb failover script may be my best option.
Getting back to the version 3.3 - after 1.9btf Quest starting selling the product. I don't know the exact history behind it but 1.9btf is what you get without paying for anything. I have worked with and continue to monitor a network with 3.1 or 3.2 (they decided to revert from 3.3 as it looks quite a bit different on the BBDISPLAY) and I honestly don't see what they've changed between 1.9 and 3.2. Most of the features of BB I heard of or read were not only already in Hobbit but were even better then what I had heard. Not to mention the dozens of BB scripts that can be relatively painless to migrate.
Ah, interesting - I always had the feeling that quest didn't do much of anything with the code except put in some verbiage and legal warnings, and tried to push their own proprietary and non linux-friendly stuff and left the bb code base to slowly decay.
Joe
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
I'd be using Henrik's solution as follows, given your situation:
"I run two completely separate systems in parallel, and have the clients report to both of them. The system at our disaster center has the paging module disabled (just disable the [bbpage] section in hobbitlaunch.cfg), to avoid double alerts - it is simple to activate it, if necessary.
"Config files are rsync'ed from the primary site to the disaster site regularly."
Though to be honest, this failover script may be something that can be converted over to be used in hobbit. You might be better off going one of a dozen different options that are slightly different than how you have it setup, but that's up to you.
Hobbit doesn't have this built-in. That's for sure. I would think it's fairly easy to use it to get much the same effect, though. I'll wait for others responses on your situation and throw my own thoughts back in tomorrow morning.
Tod Hansmann Network Engineer
-----Original Message----- From: Sloan [mailto:joe at tmsusa.com] Sent: Thursday, November 01, 2007 5:03 PM To: hobbit at hswn.dk Subject: Re: [hobbit] big brother replacement
Tod Hansmann wrote:
Let me see if I understand. You have several bb servers at one datacenter, each with their twin at the other datacenter, and both sets do the tests. They report to one central display server, but only one set reports at a time, depending on failover state, correct?
You have the basic idea, but there is no single central server, just pairs of bb servers, one to a data center, in each lan which is being monitored. For each pair of bb servers, only the server at data center A does reporting, unless the server in data center B cannot reach the server in data center A, in which case the server in data center B will take over the reporting duties until the bb server in data center A becomes reachable again. While this could theoretically lead to a split brain condition, the failover condition has only ever triggered when there was a wan outage.
Is this failover automatic? If so, how is this failover determined? What if this failover has a false positive? If not, what is your timeframe to swap over?
IIRC It takes one bb cycle to kick in.
We've not seen a false positive, as I mentioned above.
It's just the standard built-in bb failover -
head ~bb/ext/failover follows:
#!/bin/sh
failover
BIG BROTHER - FAILOVER SCRIPT
Sean MacGuire
(c) Copyright Quest Software, Inc. 1997-2003 All rights reserved.
failover WATCHES BBNET and BBPAGER
IF BBNET OR BBPAGER BECOMES UNAVAILABLE, THEN TAKE OVER UNTIL THEY
RETURN
To use, just add failover to the BBEXT variable in etc/bbdef.sh
To configure BBPAGER failover:
define both the primary and failover machines as BBPAGERS in
etc/bb-hosts
and set bbwarn: FAILOVER in etc/bbwarnsetup.cfg
Joe
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
I'm not entire sure what you mean when you reference the failover capability. Could you please explain how this works? I'm interested in knowing how the hostname reflects to what IP addresses, hardware running what software specifically, etc. Coming from BB1.9btf I don't know of many expansions between 1.9 and 3.3.
We had some discussion about multiple servers and redundancy just a short while ago: http://www.hswn.dk/hobbiton/2007/10/msg00423.html
On 11/1/07, Sloan <joe at tmsusa.com> wrote:
Hello list,
It's that time of year again - we're looking for alternatives to our aging bb infrastructure - although it's been helped by the bbgen extensions, it is showing it's age, and is getting harder to support as time goes by.
Of all the potential replacements we've looked at, I don't really like any of them - the commercial bb stuff is uninspiring, and their linux support is lacking. The other solutions tend to be heavyweight j2ee and database apps, or oddities like nagios. What I'd really love to find is something like an up-to-date version of big brother+bbgen, something like hobbit.
Unfortunately, last I checked, hobbit still lacked a crucial capability that we depend on, the built-in bb failover mechanism. We have 2 data centers, several hundred miles apart, with bb servers in several lans at both sites. Each bb server has a twin at the other location, and they both monitor the servers in both data centers, but only one of the bb servers does reporting, as determined by the failover state. The bb failover has worked marvelously, and has kept bb firmly in place so far, despite the other advantages of hobbit.
So, the $64 question: Is there anything in hobbit, or on the horizon, which will allow hobbit to serve as a drop-in replacement for bb, including the failover capability?
Thanks for your words of wisdom.
Joe
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
Josh Luthman wrote:
I'm not entire sure what you mean when you reference the failover capability. Could you please explain how this works?
I'm basically just a user of the capability, how much detail do you want? It's just the standard failover built into bb.
I'm interested in knowing how the hostname reflects to what IP addresses, hardware running what software specifically, etc.
How the hostname reflects to what IP address? I'm not sure what you mean. There are no tricks here, just the standard dns naming scheme.
I'm not sure what hardware has to do with it, but we're running SLES on HP/Compaq DL servers.
I'm not sure what you mean by "what software" - you mean the OS, or the applications being monitored, or the exact version of bb?
Coming from BB1.9btf I don't know of many expansions between 1.9 and 3.3.
We're on 1.9 here, patched with bbgen to keep it going - AFAIK the bb code has basically languished since it was bought back in 2001 or so, so I'm curious about this version 3.3 that you speak of.
We had some discussion about multiple servers and redundancy just a short while ago: http://www.hswn.dk/hobbiton/2007/10/msg00423.html
Yes, those discussions look mostly like the typical ha requirements, e.g. managing bb failover via external proxies, redirectors etc, which adds a whole new layer of cost and complexity. It would be a hard sell to justify all the new ha infrastructure if we are replacing bb with something newer and better, since bb currently handles that all by itself, with no need of an external ha system.
Joe
Hi Joe,
On Thu, Nov 01, 2007 at 03:20:12PM -0700, Sloan wrote:
So, the $64 question: Is there anything in hobbit, or on the horizon, which will allow hobbit to serve as a drop-in replacement for bb, including the failover capability?
The BB "failover" script does two things: It makes the network tests run on the failover server if the primary BBNET server cannot be ping'ed; and it enables alerts being sent from the failover server if there is no connection from the failover server to the primary BBPAGER server.
The network-test failover is fairly simple to do. I've attached two scripts here, both of which must run on the backup/standby/failover server:
failover.sh - goes in ~hobbit/server/ext/ Add a section to hobbitlaunch.cfg with
[failovercheck] ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD $BBHOME/ext/failover.sh 10.0.0.1 hobbitnet.mydom.com
"10.0.0.1" is the IP of your primary Hobbit server, "hobbitnet.mydom.com" is the hostname (in the bb-hosts file) of the primary network test machine.
What this does is that it queries the primary Hobbit server for how long ago the network tests were updated. If more than 7 minutes ago it deems the primary network test node to be DOWN, and flags this via the file $BBTMP/primarynetDOWN. If the network test update was less than 7 minutes ago, it removes the file.
This is then used by the other script, which replaces the CMD in the "[bbnet]" section in hobbitlaunch.cfg.
failovernet.sh - goes in ~hobbit/server/ext/ When this runs to do the normal network tests, it will check for the presence of the $BBTMP/primarynetDOWN file. If this file exists, it picks up the IP of the primary Hobbit server from the file, and modifies the settings to report data to both the normal (local) Hobbit server, and to the primary server. If the file does not exist, it will just run the network tests the normal way. So to run this, modify the [bbnet] section in hobbitlaunch.cfg and change the CMD setting to "$BBHOME/server/ext/failovernet.sh"
The alert failover is different, because Hobbit doesn't have a separate BBPAGER server - alerts are sent from the same host that handles the Hobbit data collection and webpages. A solution to this has been implemented for the next release, where the alerting module can be distributed onto multiple servers, but only one of them will send alerts at any given time.
Regards, Henrik
So I take it that Joe has to Paypal Henrik $64 now?
Please let me, and everyone else of course, know how the failover script works on Hobbit. I'd be very interested in knowing the result to this!
Thanks to all three of you!
On 11/2/07, Henrik Stoerner <henrik at hswn.dk> wrote:
Hi Joe,
On Thu, Nov 01, 2007 at 03:20:12PM -0700, Sloan wrote:
So, the $64 question: Is there anything in hobbit, or on the horizon, which will allow hobbit to serve as a drop-in replacement for bb, including the failover capability?
The BB "failover" script does two things: It makes the network tests run on the failover server if the primary BBNET server cannot be ping'ed; and it enables alerts being sent from the failover server if there is no connection from the failover server to the primary BBPAGER server.
The network-test failover is fairly simple to do. I've attached two scripts here, both of which must run on the backup/standby/failover server:
failover.sh - goes in ~hobbit/server/ext/ Add a section to hobbitlaunch.cfg with
[failovercheck] ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD $BBHOME/ext/failover.sh 10.0.0.1 hobbitnet.mydom.com
"10.0.0.1" is the IP of your primary Hobbit server, "hobbitnet.mydom.com" is the hostname (in the bb-hosts file) of the primary network test machine.
What this does is that it queries the primary Hobbit server for how long ago the network tests were updated. If more than 7 minutes ago it deems the primary network test node to be DOWN, and flags this via the file $BBTMP/primarynetDOWN. If the network test update was less than 7 minutes ago, it removes the file.
This is then used by the other script, which replaces the CMD in the "[bbnet]" section in hobbitlaunch.cfg.
failovernet.sh - goes in ~hobbit/server/ext/ When this runs to do the normal network tests, it will check for the presence of the $BBTMP/primarynetDOWN file. If this file exists, it picks up the IP of the primary Hobbit server from the file, and modifies the settings to report data to both the normal (local) Hobbit server, and to the primary server. If the file does not exist, it will just run the network tests the normal way. So to run this, modify the [bbnet] section in hobbitlaunch.cfg and change the CMD setting to "$BBHOME/server/ext/failovernet.sh"
The alert failover is different, because Hobbit doesn't have a separate BBPAGER server - alerts are sent from the same host that handles the Hobbit data collection and webpages. A solution to this has been implemented for the next release, where the alerting module can be distributed onto multiple servers, but only one of them will send alerts at any given time.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
Yes, but keep in mind that's $64 octal.
Joe
Josh Luthman wrote:
So I take it that Joe has to Paypal Henrik $64 now?
Please let me, and everyone else of course, know how the failover script works on Hobbit. I'd be very interested in knowing the result to this!
Thanks to all three of you!
On 11/2/07, *Henrik Stoerner* <henrik at hswn.dk <mailto:henrik at hswn.dk>> wrote:
Hi Joe, On Thu, Nov 01, 2007 at 03:20:12PM -0700, Sloan wrote: > So, the $64 question: Is there anything in hobbit, or on the horizon, > which will allow hobbit to serve as a drop-in replacement for bb, > including the failover capability? The BB "failover" script does two things: It makes the network tests run on the failover server if the primary BBNET server cannot be ping'ed; and it enables alerts being sent from the failover server if there is no connection from the failover server to the primary BBPAGER server. The network-test failover is fairly simple to do. I've attached two scripts here, both of which must run on the backup/standby/failover server: 1) failover.sh - goes in ~hobbit/server/ext/ Add a section to hobbitlaunch.cfg with [failovercheck] ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD $BBHOME/ext/failover.sh 10.0.0.1 <http://10.0.0.1> hobbitnet.mydom.com <http://hobbitnet.mydom.com> "10.0.0.1 <http://10.0.0.1>" is the IP of your primary Hobbit server, "hobbitnet.mydom.com <http://hobbitnet.mydom.com>" is the hostname (in the bb-hosts file) of the primary network test machine. What this does is that it queries the primary Hobbit server for how long ago the network tests were updated. If more than 7 minutes ago it deems the primary network test node to be DOWN, and flags this via the file $BBTMP/primarynetDOWN. If the network test update was less than 7 minutes ago, it removes the file. This is then used by the other script, which replaces the CMD in the "[bbnet]" section in hobbitlaunch.cfg. 2) failovernet.sh - goes in ~hobbit/server/ext/ When this runs to do the normal network tests, it will check for the presence of the $BBTMP/primarynetDOWN file. If this file exists, it picks up the IP of the primary Hobbit server from the file, and modifies the settings to report data to both the normal (local) Hobbit server, and to the primary server. If the file does not exist, it will just run the network tests the normal way. So to run this, modify the [bbnet] section in hobbitlaunch.cfg and change the CMD setting to "$BBHOME/server/ext/failovernet.sh" The alert failover is different, because Hobbit doesn't have a separate BBPAGER server - alerts are sent from the same host that handles the Hobbit data collection and webpages. A solution to this has been implemented for the next release, where the alerting module can be distributed onto multiple servers, but only one of them will send alerts at any given time. Regards, Henrik To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk <mailto:hobbit-unsubscribe at hswn.dk>-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
Aw...don't be cheap...go ahead and kick in the other $12...
-----Original Message----- From: Sloan [mailto:joe at tmsusa.com] Sent: Friday, November 02, 2007 1:20 PM To: hobbit at hswn.dk Subject: Re: [hobbit] big brother replacement
Yes, but keep in mind that's $64 octal.
Joe
Josh Luthman wrote:
So I take it that Joe has to Paypal Henrik $64 now?
Please let me, and everyone else of course, know how the failover script works on Hobbit. I'd be very interested in knowing the result to this!
Thanks to all three of you!
On 11/2/07, *Henrik Stoerner* <henrik at hswn.dk <mailto:henrik at hswn.dk>> wrote:
Hi Joe, On Thu, Nov 01, 2007 at 03:20:12PM -0700, Sloan wrote: > So, the $64 question: Is there anything in hobbit, or on the horizon, > which will allow hobbit to serve as a drop-in replacement forbb, > including the failover capability?
The BB "failover" script does two things: It makes the networktests run on the failover server if the primary BBNET server cannot be ping'ed; and it enables alerts being sent from the failover server if there is no connection from the failover server to the primary BBPAGER server.
The network-test failover is fairly simple to do. I've attachedtwo scripts here, both of which must run on the backup/standby/failover server:
1) failover.sh - goes in ~hobbit/server/ext/ Add a section to hobbitlaunch.cfg with [failovercheck] ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD $BBHOME/ext/failover.sh 10.0.0.1 <http://10.0.0.1> hobbitnet.mydom.com <http://hobbitnet.mydom.com> "10.0.0.1 <http://10.0.0.1>" is the IP of your primary Hobbit server, "hobbitnet.mydom.com <http://hobbitnet.mydom.com>" is the hostname (in the bb-hosts file) of the primary network test machine. What this does is that it queries the primary Hobbit server for how long ago the network tests were updated. If more than 7 minutesago it deems the primary network test node to be DOWN, and flags this via the file $BBTMP/primarynetDOWN. If the network test update was less than 7 minutes ago, it removes the file.
This is then used by the other script, which replaces the CMD in the "[bbnet]" section in hobbitlaunch.cfg. 2) failovernet.sh - goes in ~hobbit/server/ext/ When this runs to do the normal network tests, it will check for the presence of the $BBTMP/primarynetDOWN file. If this fileexists, it picks up the IP of the primary Hobbit server from the file, and modifies the settings to report data to both the normal (local) Hobbit server, and to the primary server. If the file does not exist, it will just run the network tests the normal way. So to run this, modify the [bbnet] section in hobbitlaunch.cfg and change the CMD setting to "$BBHOME/server/ext/failovernet.sh"
The alert failover is different, because Hobbit doesn't have a separate BBPAGER server - alerts are sent from the same host that handlesthe Hobbit data collection and webpages. A solution to this has been implemented for the next release, where the alerting module can be distributed onto multiple servers, but only one of them will send alerts at any given time.
Regards, Henrik To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk <mailto:hobbit-unsubscribe at hswn.dk>-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Galen Johnson wrote:
Aw...don't be cheap...go ahead and kick in the other $12...
OK you win - $40 hex, as soon as I get paid.
Joe
-----Original Message----- From: Sloan [mailto:joe at tmsusa.com] Sent: Friday, November 02, 2007 1:20 PM To: hobbit at hswn.dk Subject: Re: [hobbit] big brother replacement
Yes, but keep in mind that's $64 octal.
Joe
Josh Luthman wrote:
So I take it that Joe has to Paypal Henrik $64 now?
Please let me, and everyone else of course, know how the failover script works on Hobbit. I'd be very interested in knowing the result to this!
Thanks to all three of you!
On 11/2/07, *Henrik Stoerner* <henrik at hswn.dk <mailto:henrik at hswn.dk>> wrote:
Hi Joe, On Thu, Nov 01, 2007 at 03:20:12PM -0700, Sloan wrote: > So, the $64 question: Is there anything in hobbit, or on the horizon, > which will allow hobbit to serve as a drop-in replacement forbb,
> including the failover capability? The BB "failover" script does two things: It makes the networktests
run on the failover server if the primary BBNET server cannot be ping'ed; and it enables alerts being sent from the failover server if there is no connection from the failover server to the primary BBPAGER server. The network-test failover is fairly simple to do. I've attachedtwo
scripts here, both of which must run on thebackup/standby/failover
server: 1) failover.sh - goes in ~hobbit/server/ext/ Add a section to hobbitlaunch.cfg with [failovercheck] ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD $BBHOME/ext/failover.sh 10.0.0.1 <http://10.0.0.1> hobbitnet.mydom.com <http://hobbitnet.mydom.com> "10.0.0.1 <http://10.0.0.1>" is the IP of your primary Hobbit server, "hobbitnet.mydom.com <http://hobbitnet.mydom.com>" is the hostname (in the bb-hosts file) of the primary network test machine. What this does is that it queries the primary Hobbit server for how long ago the network tests were updated. If more than 7 minutesago
it deems the primary network test node to be DOWN, and flags this via the file $BBTMP/primarynetDOWN. If the network test update was less than 7 minutes ago, it removes the file. This is then used by the other script, which replaces the CMD in the "[bbnet]" section in hobbitlaunch.cfg. 2) failovernet.sh - goes in ~hobbit/server/ext/ When this runs to do the normal network tests, it will check for the presence of the $BBTMP/primarynetDOWN file. If this fileexists, it
picks up the IP of the primary Hobbit server from the file, and modifies the settings to report data to both the normal (local) Hobbit server, and to the primary server. If the file does not exist, it will just run the network tests the normal way. So to run this, modify the [bbnet] section in hobbitlaunch.cfgand
change the CMD setting to "$BBHOME/server/ext/failovernet.sh" The alert failover is different, because Hobbit doesn't have a separate BBPAGER server - alerts are sent from the same host that handlesthe
Hobbit data collection and webpages. A solution to this has been implemented for the next release, where the alerting module can be distributed onto multiple servers, but only one of them will send alerts at any given time. Regards, Henrik To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk <mailto:hobbit-unsubscribe at hswn.dk>-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Just offer $1000000 binary...man, I'm on geek overload...
-----Original Message----- From: Sloan [mailto:joe at tmsusa.com] Sent: Friday, November 02, 2007 2:31 PM To: hobbit at hswn.dk Subject: Re: [hobbit] big brother replacement
Galen Johnson wrote:
Aw...don't be cheap...go ahead and kick in the other $12...
OK you win - $40 hex, as soon as I get paid.
Joe
-----Original Message----- From: Sloan [mailto:joe at tmsusa.com] Sent: Friday, November 02, 2007 1:20 PM To: hobbit at hswn.dk Subject: Re: [hobbit] big brother replacement
Yes, but keep in mind that's $64 octal.
Joe
Josh Luthman wrote:
So I take it that Joe has to Paypal Henrik $64 now?
Please let me, and everyone else of course, know how the failover script works on Hobbit. I'd be very interested in knowing the result to this!
Thanks to all three of you!
On 11/2/07, *Henrik Stoerner* <henrik at hswn.dk <mailto:henrik at hswn.dk>> wrote:
Hi Joe, On Thu, Nov 01, 2007 at 03:20:12PM -0700, Sloan wrote: > So, the $64 question: Is there anything in hobbit, or on the horizon, > which will allow hobbit to serve as a drop-in replacement forbb,
> including the failover capability? The BB "failover" script does two things: It makes the networktests
run on the failover server if the primary BBNET server cannot be ping'ed; and it enables alerts being sent from the failoverserver
if there is no connection from the failover server to the primary BBPAGER server. The network-test failover is fairly simple to do. I've attachedtwo
scripts here, both of which must run on thebackup/standby/failover
server: 1) failover.sh - goes in ~hobbit/server/ext/ Add a section to hobbitlaunch.cfg with [failovercheck] ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD $BBHOME/ext/failover.sh 10.0.0.1 <http://10.0.0.1> hobbitnet.mydom.com <http://hobbitnet.mydom.com> "10.0.0.1 <http://10.0.0.1>" is the IP of your primary Hobbit server, "hobbitnet.mydom.com <http://hobbitnet.mydom.com>" is the hostname (in the bb-hosts file) of the primary network test machine. What this does is that it queries the primary Hobbit serverfor
how long ago the network tests were updated. If more than 7minutes
ago
it deems the primary network test node to be DOWN, and flags this via the file $BBTMP/primarynetDOWN. If the network test update was less than 7 minutes ago, it removes the file. This is then used by the other script, which replaces the CMD in the "[bbnet]" section in hobbitlaunch.cfg. 2) failovernet.sh - goes in ~hobbit/server/ext/ When this runs to do the normal network tests, it will check for the presence of the $BBTMP/primarynetDOWN file. If this fileexists, it
picks up the IP of the primary Hobbit server from the file,and
modifies the settings to report data to both the normal(local)
Hobbit server, and to the primary server. If the file does not exist, it will just run the network tests the normal way. So to run this, modify the [bbnet] section in hobbitlaunch.cfgand
change the CMD setting to "$BBHOME/server/ext/failovernet.sh" The alert failover is different, because Hobbit doesn't have a separate BBPAGER server - alerts are sent from the same host that handlesthe
Hobbit data collection and webpages. A solution to this has been implemented for the next release, where the alerting module canbe
distributed onto multiple servers, but only one of them will send alerts at any given time. Regards, Henrik To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk <mailto:hobbit-unsubscribe at hswn.dk>-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On 11/2/07, Galen Johnson <Galen.Johnson at sas.com> wrote:
Just offer $1000000 binary...man, I'm on geek overload...
::shakes head::
-----Original Message-----
From: Sloan [mailto:joe at tmsusa.com] Sent: Friday, November 02, 2007 2:31 PM To: hobbit at hswn.dk Subject: Re: [hobbit] big brother replacement
Galen Johnson wrote:
Aw...don't be cheap...go ahead and kick in the other $12...
OK you win - $40 hex, as soon as I get paid.
Joe
-----Original Message----- From: Sloan [mailto:joe at tmsusa.com] Sent: Friday, November 02, 2007 1:20 PM To: hobbit at hswn.dk Subject: Re: [hobbit] big brother replacement
Yes, but keep in mind that's $64 octal.
Joe
Josh Luthman wrote:
So I take it that Joe has to Paypal Henrik $64 now?
Please let me, and everyone else of course, know how the failover script works on Hobbit. I'd be very interested in knowing the result to this!
Thanks to all three of you!
On 11/2/07, *Henrik Stoerner* <henrik at hswn.dk <mailto:henrik at hswn.dk>> wrote:
Hi Joe, On Thu, Nov 01, 2007 at 03:20:12PM -0700, Sloan wrote: > So, the $64 question: Is there anything in hobbit, or on the horizon, > which will allow hobbit to serve as a drop-in replacement forbb,
> including the failover capability? The BB "failover" script does two things: It makes the networktests
run on the failover server if the primary BBNET server cannot be ping'ed; and it enables alerts being sent from the failoverserver
if there is no connection from the failover server to the primary BBPAGER server. The network-test failover is fairly simple to do. I've attachedtwo
scripts here, both of which must run on thebackup/standby/failover
server: 1) failover.sh - goes in ~hobbit/server/ext/ Add a section to hobbitlaunch.cfg with [failovercheck] ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD $BBHOME/ext/failover.sh 10.0.0.1 <http://10.0.0.1> hobbitnet.mydom.com <http://hobbitnet.mydom.com> "10.0.0.1 <http://10.0.0.1>" is the IP of your primary Hobbit server, "hobbitnet.mydom.com <http://hobbitnet.mydom.com>" is the hostname (in the bb-hosts file) of the primary network test machine. What this does is that it queries the primary Hobbit serverfor
how long ago the network tests were updated. If more than 7minutes
ago
it deems the primary network test node to be DOWN, and flags this via the file $BBTMP/primarynetDOWN. If the network test update was less than 7 minutes ago, it removes the file. This is then used by the other script, which replaces the CMD in the "[bbnet]" section in hobbitlaunch.cfg. 2) failovernet.sh - goes in ~hobbit/server/ext/ When this runs to do the normal network tests, it will check for the presence of the $BBTMP/primarynetDOWN file. If this fileexists, it
picks up the IP of the primary Hobbit server from the file,and
modifies the settings to report data to both the normal(local)
Hobbit server, and to the primary server. If the file does not exist, it will just run the network tests the normal way. So to run this, modify the [bbnet] section in hobbitlaunch.cfgand
change the CMD setting to "$BBHOME/server/ext/failovernet.sh" The alert failover is different, because Hobbit doesn't have a separate BBPAGER server - alerts are sent from the same host that handlesthe
Hobbit data collection and webpages. A solution to this has been implemented for the next release, where the alerting module canbe
distributed onto multiple servers, but only one of them will send alerts at any given time. Regards, Henrik To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk <mailto:hobbit-unsubscribe at hswn.dk>-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
This looks promising, I'll give it a whirl -
Joe
Henrik Stoerner wrote:
Hi Joe,
On Thu, Nov 01, 2007 at 03:20:12PM -0700, Sloan wrote:
So, the $64 question: Is there anything in hobbit, or on the horizon, which will allow hobbit to serve as a drop-in replacement for bb, including the failover capability?
The BB "failover" script does two things: It makes the network tests run on the failover server if the primary BBNET server cannot be ping'ed; and it enables alerts being sent from the failover server if there is no connection from the failover server to the primary BBPAGER server.
The network-test failover is fairly simple to do. I've attached two scripts here, both of which must run on the backup/standby/failover server:
failover.sh - goes in ~hobbit/server/ext/ Add a section to hobbitlaunch.cfg with
[failovercheck] ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD $BBHOME/ext/failover.sh 10.0.0.1 hobbitnet.mydom.com
"10.0.0.1" is the IP of your primary Hobbit server, "hobbitnet.mydom.com" is the hostname (in the bb-hosts file) of the primary network test machine.
What this does is that it queries the primary Hobbit server for how long ago the network tests were updated. If more than 7 minutes ago it deems the primary network test node to be DOWN, and flags this via the file $BBTMP/primarynetDOWN. If the network test update was less than 7 minutes ago, it removes the file.
This is then used by the other script, which replaces the CMD in the "[bbnet]" section in hobbitlaunch.cfg.
failovernet.sh - goes in ~hobbit/server/ext/ When this runs to do the normal network tests, it will check for the presence of the $BBTMP/primarynetDOWN file. If this file exists, it picks up the IP of the primary Hobbit server from the file, and modifies the settings to report data to both the normal (local) Hobbit server, and to the primary server. If the file does not exist, it will just run the network tests the normal way. So to run this, modify the [bbnet] section in hobbitlaunch.cfg and change the CMD setting to "$BBHOME/server/ext/failovernet.sh"
The alert failover is different, because Hobbit doesn't have a separate BBPAGER server - alerts are sent from the same host that handles the Hobbit data collection and webpages. A solution to this has been implemented for the next release, where the alerting module can be distributed onto multiple servers, but only one of them will send alerts at any given time.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Henrik Stoerner wrote:
The alert failover is different, because Hobbit doesn't have a separate BBPAGER server - alerts are sent from the same host that handles the Hobbit data collection and webpages. A solution to this has been implemented for the next release, where the alerting module can be distributed onto multiple servers, but only one of them will send alerts at any given time.
So the distributed alerting capability will be in the soon-to-be-released 4.3.0?
Joe
On Fri, Nov 02, 2007 at 02:48:32PM -0700, Sloan wrote:
Henrik Stoerner wrote:
The alert failover is different, because Hobbit doesn't have a separate BBPAGER server - alerts are sent from the same host that handles the Hobbit data collection and webpages. A solution to this has been implemented for the next release, where the alerting module can be distributed onto multiple servers, but only one of them will send alerts at any given time.
So the distributed alerting capability will be in the soon-to-be-released 4.3.0?
Yes.
Henrik
Hi Henrick,
I just setup a backup server for our primary hobbit server yesterday that was inspired by this:
From Henrick: I run two completely separate systems in parallel, and have the clients report to both of them. The system at our disaster center has the paging module disabled (just disable the [bbpage] section in hobbitlaunch.cfg), to avoid double alerts - it is simple to activate it, if necessary. --
((manual I think))
Now that is my current setup, however you just created a failover script which will make the failover transition automatic.The concept is just the same but, is the clients will still report to both HB servers if there is no failover?
Thanks and regards, Ryan
----- Original Message ----- From: "Henrik Stoerner" <henrik at hswn.dk> To: <hobbit at hswn.dk> Sent: Friday, November 02, 2007 10:37 PM Subject: Re: [hobbit] big brother replacement
Hi Joe,
On Thu, Nov 01, 2007 at 03:20:12PM -0700, Sloan wrote:
So, the $64 question: Is there anything in hobbit, or on the horizon, which will allow hobbit to serve as a drop-in replacement for bb, including the failover capability?
The BB "failover" script does two things: It makes the network tests run on the failover server if the primary BBNET server cannot be ping'ed; and it enables alerts being sent from the failover server if there is no connection from the failover server to the primary BBPAGER server.
The network-test failover is fairly simple to do. I've attached two scripts here, both of which must run on the backup/standby/failover server:
failover.sh - goes in ~hobbit/server/ext/ Add a section to hobbitlaunch.cfg with
[failovercheck] ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD $BBHOME/ext/failover.sh 10.0.0.1 hobbitnet.mydom.com
"10.0.0.1" is the IP of your primary Hobbit server, "hobbitnet.mydom.com" is the hostname (in the bb-hosts file) of the primary network test machine.
What this does is that it queries the primary Hobbit server for how long ago the network tests were updated. If more than 7 minutes ago it deems the primary network test node to be DOWN, and flags this via the file $BBTMP/primarynetDOWN. If the network test update was less than 7 minutes ago, it removes the file.
This is then used by the other script, which replaces the CMD in the "[bbnet]" section in hobbitlaunch.cfg.
- failovernet.sh - goes in ~hobbit/server/ext/ When this runs to do the normal network tests, it will check for the presence of the $BBTMP/primarynetDOWN file. If this file exists, it picks up the IP of the primary Hobbit server from the file, and modifies the settings to report data to both the normal (local) Hobbit server, and to the primary server. If the file does not exist, it will just run the network tests the normal way. So to run this, modify the [bbnet] section in hobbitlaunch.cfg and change the CMD setting to "$BBHOME/server/ext/failovernet.sh"
The alert failover is different, because Hobbit doesn't have a separate BBPAGER server - alerts are sent from the same host that handles the Hobbit data collection and webpages. A solution to this has been implemented for the next release, where the alerting module can be distributed onto multiple servers, but only one of them will send alerts at any given time.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Sat, Nov 03, 2007 at 02:27:42PM +0800, Ryan Jay B. Lapuz wrote:
I just setup a backup server for our primary hobbit server yesterday that was inspired by this:
From Henrick: I run two completely separate systems in parallel, and have the clients report to both of them. The system at our disaster center has the paging module disabled (just disable the [bbpage] section in hobbitlaunch.cfg), to avoid double alerts - it is simple to activate it, if necessary. --
((manual I think))Now that is my current setup, however you just created a failover script which will make the failover transition automatic.The concept is just the same but, is the clients will still report to both HB servers if there is no failover?
The failover script is used in a scenario where you have two servers running the network tests - and ONLY the network tests, not the Hobbit display - each reporting to their own Hobbit server. If the primary network test server server goes down then you want the backup server to automatically start feeding the test results to both servers. In that case it becomes necessary to have a failover from the primary to the backup server to do the network tests, and that is what the script I posted does.
But in both scenarios you'd probably want to have the "full picture" of the health of your systems, so you do need to have the data from the clients available, both on the primary and on the backup system. Therefore the clients should be configured to send data to both systems.
Henrik
I tried to set up two dummy hosts use as a clone for critical systems.
The first one is for the conn test and I added a bunch of systems to it(Conn_Host_P1).
That worked fine.
Next I created another clone master called something else (Proc_Host_P1). I used a system that was already set in (Conn_Host_P1), and I added it to Proc_Host_P1. However, now the added system disappears from Conn_Host_P1 as a clone and only appears in Proc_Host_P1.
Am I understanding that only 1 master clone item can exist? I wanted master tests for conn and procs and a few other items.
I am using the web page Edit Critical Systems to do this.
The file permissions are set as hobbit owner and webxxx group in server/etc/hobbit-nkview.cfg.
Any insight would be appreciated.
Tom
On 20071102, Henrik Stoerner wrote:
The alert failover is different, because Hobbit doesn't have a separate BBPAGER server - alerts are sent from the same host that handles the Hobbit data collection and webpages. A solution to this has been implemented for the next release, where the alerting module can be distributed onto multiple servers, but only one of them will send alerts at any given time.
Could you elaborate on this alerting module configuration, or point me to TFM? If it has indeed been implemented, it sounds like it would be the key to enabling hobbit to emulate the big brother style alerting failover behavior.
Joe
participants (8)
-
Galen.Johnson@sas.com
-
gumby3203@gmail.com
-
henrik@hswn.dk
-
joe@tmsusa.com
-
josh@imaginenetworksllc.com
-
rlapuz@fcpp.fujitsu.com
-
thansmann@directpointe.com
-
Tom.Stewart@landsend.com