So, how are others doing this? I have a server set up here in my primary data center. We're monitoring a few thousand hosts right now with a large number of custom externals.
I've been tasked with setting up a fail-over or disaster response server in case our primary data center has issues. All of our clients are currently configured to send their messages to the IP address of our primary server.
Now, I could just copy the bb-hosts file to the DR site, but then I would only get the network tests since the clients all report to the primary.
Would I use bbproxy to do this? But if I install bbproxy on the primary, I won't be proxying the messages if my primary goes down... :(
-- Stewart Larsen
On Tue, Oct 23, 2007 at 02:18:16PM -0400, Stewart L wrote:
So, how are others doing this? I have a server set up here in my primary data center. We're monitoring a few thousand hosts right now with a large number of custom externals.
I've been tasked with setting up a fail-over or disaster response server in case our primary data center has issues. All of our clients are currently configured to send their messages to the IP address of our primary server.
Now, I could just copy the bb-hosts file to the DR site, but then I would only get the network tests since the clients all report to the primary.
I run two completely separate systems in parallel, and have the clients report to both of them. The system at our disaster center has the paging module disabled (just disable the [bbpage] section in hobbitlaunch.cfg), to avoid double alerts - it is simple to activate it, if necessary.
Config files are rsync'ed from the primary site to the disaster site regularly.
Regards, Henrik
Date: Tue, 23 Oct 2007 22:02:34 +0200> From: henrik at hswn.dk> To: hobbit at hswn.dk> Subject: Re: [hobbit] Fail over?> > On Tue, Oct 23, 2007 at 02:18:16PM -0400, Stewart L wrote:> > So, how are others doing this? I have a server set up here in my> > primary data center. We're monitoring a few thousand hosts right now> > with a large number of custom externals.> > > > I've been tasked with setting up a fail-over or disaster response> > server in case our primary data center has issues. All of our clients> > are currently configured to send their messages to the IP address of> > our primary server.> > > > Now, I could just copy the bb-hosts file to the DR site, but then I> > would only get the network tests since the clients all report to the> > primary.> > I run two completely separate systems in parallel, and have the clients> report to both of them. The system at our disaster center has the paging> module disabled (just disable the [bbpage] section in hobbitlaunch.cfg),> to avoid double alerts - it is simple to activate it, if necessary.
I was thinking of using Sun Cluster(hb on Solaris) or HeartBeat(hb on Linux) but then how can I configure the Cluster solution to failover from one site(Florida) to another(NewYork) ?
I believe this setup is the most simple failover solution at the only expense of extra network bandwidth usgage to the secondary hb server.
tj
Config files are rsync'ed from the primary site to the disaster site> regularly.> > > Regards,> Henrik> > > To unsubscribe from the hobbit list, send an e-mail to> hobbit-unsubscribe at hswn.dk> >
Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare! http://onecare.live.com/standard/en-us/purchase/trial.aspx?s_cid=wl_hotmailn...
I believe you could use something like a proxy (Squid maybe?) for clients to connect to and then use one or the other. I'm not familiar at all with squid itself so I may be completely off, but a load balancer does sound like an option.
On 10/24/07, T.J. Yang <tj_yang at hotmail.com> wrote:
Date: Tue, 23 Oct 2007 22:02:34 +0200 From: henrik at hswn.dk To: hobbit at hswn.dk Subject: Re: [hobbit] Fail over?
On Tue, Oct 23, 2007 at 02:18:16PM -0400, Stewart L wrote:
So, how are others doing this? I have a server set up here in my primary data center. We're monitoring a few thousand hosts right now with a large number of custom externals.
I've been tasked with setting up a fail-over or disaster response server in case our primary data center has issues. All of our clients are currently configured to send their messages to the IP address of our primary server.
Now, I could just copy the bb-hosts file to the DR site, but then I would only get the network tests since the clients all report to the primary.
I run two completely separate systems in parallel, and have the clients report to both of them. The system at our disaster center has the paging module disabled (just disable the [bbpage] section in hobbitlaunch.cfg), to avoid double alerts - it is simple to activate it, if necessary.
I was thinking of using Sun Cluster(hb on Solaris) or HeartBeat(hb on Linux) but then how can I configure the Cluster solution to failover from one site(Florida) to another(NewYork) ?
I believe this setup is the most simple failover solution at the only expense of extra network bandwidth usgage to the secondary hb server.
tj
Config files are rsync'ed from the primary site to the disaster site regularly.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare! Try now!<http://onecare.live.com/standard/en-us/purchase/trial.aspx?s_cid=wl_hotmailnews>
-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
running snapshot 4.3
crashes twice a day, every day, @line 112 in loadhosts file
( sprintf(newp->pagepath, "%s/%s", curtoppage->pagepath, name); ) is the line in question
this is running on Solaris 10 x86, compiled with
./configure.server --rrdinclude /sw/include --rrdlib /sw/lib --pcreinclude /sw/include --pcrelib /sw/lib --sslinclude /sw/include/openssl --ssllib /sw/ssl/lib
don't see why it's crashing at all, any ideas?
Reading hobbitd_channel core file header read successfully Reading ld.so.1 Reading libresolv.so.2 Reading libsocket.so.1 Reading libnsl.so.1 Reading libc.so.1 program terminated by signal ABRT (Abort) 0xfee60717: __lwp_kill+0x0007: jae __lwp_kill+0x15 [ 0xfee60725, .+0xe ] Current function is bbh_item 466 p = strrchr(host->page->pagetitle, '/'); (dbx) where
[1] __lwp_kill(0x1, 0x6), at 0xfee60717 [2] _thr_kill(0x1, 0x6), at 0xfee5ded4 [3] raise(0x6), at 0xfee0ced3 [4] abort(0x8071c20, 0x0, 0x8046758, 0xfee4dd4f, 0x8046758, 0xfee4dd4f), at 0xfedf0969 [5] 0x80581fe(0xb, 0x0, 0x80467f0), at 0x80581fe [6] __sighndlr(0xb, 0x0, 0x80467f0, 0x80581d0), at 0xfee5fadf [7] call_user_handler(0xb, 0x0, 0x80467f0), at 0xfee560d3 [8] sigacthandler(0xb, 0x0, 0x80467f0, 0xf, 0x0, 0x0), at 0xfee56253 ---- called from signal handler with signal 11 (SIGSEGV) ------ =>[9] bbh_item(hostin = 0x80739a8, item = BBH_NET), line 466 in "loadhosts.c" [10] load_hostnames(bbhostsfn = (nil), extrainclude = 0x8046ddc "hobbitd_channel", fqdn = 134508012), line 112 in "loadhosts_file.c"
As you're on Solaris, could you do a dtrace on it?
On 10/24/07, Sean R. Clark <sclark at nyroc.rr.com> wrote:
running snapshot 4.3
crashes twice a day, every day, @line 112 in loadhosts file
( sprintf(newp->pagepath, "%s/%s", curtoppage->pagepath, name); ) is the line in question
this is running on Solaris 10 x86, compiled with
./configure.server --rrdinclude /sw/include --rrdlib /sw/lib --pcreinclude /sw/include --pcrelib /sw/lib --sslinclude /sw/include/openssl --ssllib /sw/ssl/lib
don't see why it's crashing at all, any ideas?
Reading hobbitd_channel core file header read successfully Reading ld.so.1 Reading libresolv.so.2 Reading libsocket.so.1 Reading libnsl.so.1 Reading libc.so.1 program terminated by signal ABRT (Abort) 0xfee60717: __lwp_kill+0x0007: jae __lwp_kill+0x15 [ 0xfee60725, .+0xe ] Current function is bbh_item 466 p = strrchr(host->page->pagetitle, '/'); (dbx) where [1] __lwp_kill(0x1, 0x6), at 0xfee60717 [2] _thr_kill(0x1, 0x6), at 0xfee5ded4 [3] raise(0x6), at 0xfee0ced3 [4] abort(0x8071c20, 0x0, 0x8046758, 0xfee4dd4f, 0x8046758, 0xfee4dd4f), at 0xfedf0969 [5] 0x80581fe(0xb, 0x0, 0x80467f0), at 0x80581fe [6] __sighndlr(0xb, 0x0, 0x80467f0, 0x80581d0), at 0xfee5fadf [7] call_user_handler(0xb, 0x0, 0x80467f0), at 0xfee560d3 [8] sigacthandler(0xb, 0x0, 0x80467f0, 0xf, 0x0, 0x0), at 0xfee56253 ---- called from signal handler with signal 11 (SIGSEGV) ------ =>[9] bbh_item(hostin = 0x80739a8, item = BBH_NET), line 466 in " loadhosts.c" [10] load_hostnames(bbhostsfn = (nil), extrainclude = 0x8046ddc "hobbitd_channel", fqdn = 134508012), line 112 in "loadhosts_file.c"
-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
On Wed, Oct 24, 2007 at 11:05:16AM -0400, Sean R. Clark wrote:
[8] sigacthandler(0xb, 0x0, 0x80467f0, 0xf, 0x0, 0x0), at 0xfee56253 ---- called from signal handler with signal 11 (SIGSEGV) ------ =>[9] bbh_item(hostin = 0x80739a8, item = BBH_NET), line 466 in "loadhosts.c" [10] load_hostnames(bbhostsfn = (nil), extrainclude = 0x8046ddc "hobbitd_channel", fqdn = 134508012), line 112 in "loadhosts_file.c"
This trace doesn't make sense - the "bbh_item()" function isn't called from the "load_hostnames()" function. So I think there's some corruption of the stack involved.
Either that, or the binary you're running doesn't match the source code you have (ie. your source files were not used to compile the binary that is running).
If you load the binary and core into gdb as you did to get the stack trace, could you then do this: gdb> fr 10 This should print out that you're now at stackframe #10, which is the "load_hostnames" routine. gdb> p *inbuf gdb> p name gdb> p title These print out the value of a number of variables. gdb> fr 9 gdb> p *hostin
Regards, Henrik
Ahh you are correct, my binary + source did not match
Here is the stack trace from the (correct) binary (it's still crashing)
All of them all show
program terminated by signal ABRT (Abort) 0xfee60717: __lwp_kill+0x0007: jae __lwp_kill+0x15 [ 0xfee60725, .+0xe ] Current function is sigsegv_handler 58 abort(); (dbx) where
[1] __lwp_kill(0x1, 0x6), at 0xfee60717 [2] _thr_kill(0x1, 0x6), at 0xfee5ded4 [3] raise(0x6), at 0xfee0ced3 [4] abort(0x8071c20, 0x0, 0x8046758, 0xfee4dd4f, 0x8046758, 0xfee4dd4f), at 0xfedf0969 =>[5] sigsegv_handler(signum = 11), line 58 in "sig.c" [6] __sighndlr(0xb, 0x0, 0x80467f0, 0x80581d0), at 0xfee5fadf [7] call_user_handler(0xb, 0x0, 0x80467f0), at 0xfee560d3 [8] sigacthandler(0xb, 0x0, 0x80467f0, 0xf, 0x0, 0x0), at 0xfee56253 ---- called from signal handler with signal 11 (SIGSEGV) ------ [9] main(argc = 4, argv = 0x8046b28), line 678 in "hobbitd_channel.c"
From this:
/*
* Try to fork a child to send in an alarm message.
* If the fork fails, then just attempt to exec() the BB command
*/
Do you have any commands I can run in gdb or dbx to help further?
The name & inbuf are not defined when I try it with the correct binary + core
-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: Thursday, October 25, 2007 4:16 AM To: hobbit at hswn.dk Subject: Re: [hobbit] hobbitd_channel still crashing everyday
On Wed, Oct 24, 2007 at 11:05:16AM -0400, Sean R. Clark wrote:
[8] sigacthandler(0xb, 0x0, 0x80467f0, 0xf, 0x0, 0x0), at 0xfee56253 ---- called from signal handler with signal 11 (SIGSEGV) ------ =>[9] bbh_item(hostin = 0x80739a8, item = BBH_NET), line 466 in "loadhosts.c" [10] load_hostnames(bbhostsfn = (nil), extrainclude = 0x8046ddc "hobbitd_channel", fqdn = 134508012), line 112 in "loadhosts_file.c"
This trace doesn't make sense - the "bbh_item()" function isn't called from the "load_hostnames()" function. So I think there's some corruption of the stack involved.
Either that, or the binary you're running doesn't match the source code you have (ie. your source files were not used to compile the binary that is running).
If you load the binary and core into gdb as you did to get the stack trace, could you then do this: gdb> fr 10 This should print out that you're now at stackframe #10, which is the "load_hostnames" routine. gdb> p *inbuf gdb> p name gdb> p title These print out the value of a number of variables. gdb> fr 9 gdb> p *hostin
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Thu, Oct 25, 2007 at 11:42:19AM -0400, Sean R. Clark wrote:
Ahh you are correct, my binary + source did not match
Here is the stack trace from the (correct) binary (it's still crashing) ---- called from signal handler with signal 11 (SIGSEGV) ------ [9] main(argc = 4, argv = 0x8046b28), line 678 in "hobbitd_channel.c"
Thanks, the line number isn't quite right, but I think this patch should fix it. However, it should only happen if the worker process (hobbitd_alert, hobbitd_rrd, hobbitd_history) cannot keep up with the flow of incoming messages, so there might be a different problem with your setup that triggers this. That would also explain why you see it regularly, and others do not.
Anyway, let me know if this patch stops it from crashing.
Regards, Henrik
Just to follow up
Since applying the patch it's been stable
Thanks for the patch
-Sean
-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: Thursday, October 25, 2007 4:42 PM To: hobbit at hswn.dk Subject: Re: [hobbit] hobbitd_channel still crashing everyday
On Thu, Oct 25, 2007 at 11:42:19AM -0400, Sean R. Clark wrote:
Ahh you are correct, my binary + source did not match
Here is the stack trace from the (correct) binary (it's still crashing) ---- called from signal handler with signal 11 (SIGSEGV) ------ [9] main(argc = 4, argv = 0x8046b28), line 678 in "hobbitd_channel.c"
Thanks, the line number isn't quite right, but I think this patch should fix it. However, it should only happen if the worker process (hobbitd_alert, hobbitd_rrd, hobbitd_history) cannot keep up with the flow of incoming messages, so there might be a different problem with your setup that triggers this. That would also explain why you see it regularly, and others do not.
Anyway, let me know if this patch stops it from crashing.
Regards, Henrik
That sounds like an interesting idea to use squid to load balance between the two servers. I need to do something similar in our lab and been trying to figure out the best way to do it.
On 10/24/07, Josh Luthman <josh at imaginenetworksllc.com> wrote:
I believe you could use something like a proxy (Squid maybe?) for clients to connect to and then use one or the other. I'm not familiar at all with squid itself so I may be completely off, but a load balancer does sound like an option.
On 10/24/07, T.J. Yang <tj_yang at hotmail.com> wrote:
Date: Tue, 23 Oct 2007 22:02:34 +0200 From: henrik at hswn.dk To: hobbit at hswn.dk Subject: Re: [hobbit] Fail over?
On Tue, Oct 23, 2007 at 02:18:16PM -0400, Stewart L wrote:
So, how are others doing this? I have a server set up here in my primary data center. We're monitoring a few thousand hosts right now with a large number of custom externals.
I've been tasked with setting up a fail-over or disaster response server in case our primary data center has issues. All of our clients are currently configured to send their messages to the IP address of our primary server.
Now, I could just copy the bb-hosts file to the DR site, but then I would only get the network tests since the clients all report to the primary.
I run two completely separate systems in parallel, and have the clients report to both of them. The system at our disaster center has the paging module disabled (just disable the [bbpage] section in hobbitlaunch.cfg ), to avoid double alerts - it is simple to activate it, if necessary.
I was thinking of using Sun Cluster(hb on Solaris) or HeartBeat(hb on Linux) but then how can I configure the Cluster solution to failover from one site(Florida) to another(NewYork) ?
I believe this setup is the most simple failover solution at the only expense of extra network bandwidth usgage to the secondary hb server.
tj
Config files are rsync'ed from the primary site to the disaster site regularly.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare! Try now!<http://onecare.live.com/standard/en-us/purchase/trial.aspx?s_cid=wl_hotmailnews>
-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
Isn't a proxy is another SPF (Single point of failure) ?
T.J. Yang
Date: Wed, 24 Oct 2007 12:23:20 -0400 From: paulehr at gmail.com To: hobbit at hswn.dk Subject: Re: [hobbit] Fail over?
That sounds like an interesting idea to use squid to load balance between the two servers. I need to do something similar in our lab and been trying to figure out the best way to do it.
On 10/24/07, Josh Luthman <josh at imaginenetworksllc.com> wrote: I believe you could use something like a proxy (Squid maybe?) for clients to connect to and then use one or the other. I'm not familiar at all with squid itself so I may be completely off, but a load balancer does sound like an option.
On 10/24/07, T.J. Yang < tj_yang at hotmail.com> wrote:
Date: Tue, 23 Oct 2007 22:02:34 +0200 From: henrik at hswn.dk To: hobbit at hswn.dk Subject: Re: [hobbit] Fail over?
On Tue, Oct 23, 2007 at 02:18:16PM -0400, Stewart L wrote:
So, how are others doing this? I have a server set up here in my primary data center. We're monitoring a few thousand hosts right now with a large number of custom externals.
I've been tasked with setting up a fail-over or disaster response server in case our primary data center has issues. All of our clients are currently configured to send their messages to the IP address of our primary server.
Now, I could just copy the bb-hosts file to the DR site, but then I would only get the network tests since the clients all report to the primary.
I run two completely separate systems in parallel, and have the clients report to both of them. The system at our disaster center has the paging module disabled (just disable the [bbpage] section in hobbitlaunch.cfg), to avoid double alerts - it is simple to activate it, if necessary.
I was thinking of using Sun Cluster(hb on Solaris) or HeartBeat(hb on Linux) but then how can I configure the Cluster solution to failover from one site(Florida) to another(NewYork) ?
I believe this setup is the most simple failover solution at the only expense of extra network bandwidth usgage to the secondary hb server.
tj
Config files are rsync'ed from the primary site to the disaster site regularly.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare! Try now!
-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
Help yourself to FREE treats served up daily at the Messenger Café. Stop by today. http://www.cafemessenger.com/info/info_sweetstuff2.html?ocid=TXT_TAGLM_OctWL...
yes, it is.
I've spoke with our infrastructure and support team and they will be re-configuring the client on all of their machines to point to both servers. We already have the DR server up and running, I just need to script the daily copy of the config files.
Thanks for all the input, folks!
Stewart
On 10/24/07, T.J. Yang <tj_yang at hotmail.com> wrote:
Isn't a proxy is another SPF (Single point of failure) ?
T.J. Yang
Date: Wed, 24 Oct 2007 12:23:20 -0400 From: paulehr at gmail.com To: hobbit at hswn.dk Subject: Re: [hobbit] Fail over?
That sounds like an interesting idea to use squid to load balance between the two servers. I need to do something similar in our lab and been trying to figure out the best way to do it.
On 10/24/07, Josh Luthman <josh at imaginenetworksllc.com> wrote: I believe you could use something like a proxy (Squid maybe?) for clients to connect to and then use one or the other. I'm not familiar at all with squid itself so I may be completely off, but a load balancer does sound like an option.
On 10/24/07, T.J. Yang < tj_yang at hotmail.com> wrote:
Date: Tue, 23 Oct 2007 22:02:34 +0200 From: henrik at hswn.dk To: hobbit at hswn.dk Subject: Re: [hobbit] Fail over?
On Tue, Oct 23, 2007 at 02:18:16PM -0400, Stewart L wrote:
So, how are others doing this? I have a server set up here in my primary data center. We're monitoring a few thousand hosts right now with a large number of custom externals.
I've been tasked with setting up a fail-over or disaster response server in case our primary data center has issues. All of our clients are currently configured to send their messages to the IP address of our primary server.
Now, I could just copy the bb-hosts file to the DR site, but then I would only get the network tests since the clients all report to the primary.
I run two completely separate systems in parallel, and have the clients report to both of them. The system at our disaster center has the paging module disabled (just disable the [bbpage] section in hobbitlaunch.cfg), to avoid double alerts - it is simple to activate it, if necessary.
I was thinking of using Sun Cluster(hb on Solaris) or HeartBeat(hb on Linux) but then how can I configure the Cluster solution to failover from one site(Florida) to another(NewYork) ?
I believe this setup is the most simple failover solution at the only expense of extra network bandwidth usgage to the secondary hb server.
tj
Config files are rsync'ed from the primary site to the disaster site regularly.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare! Try now!
-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
Help yourself to FREE treats served up daily at the Messenger Café. Stop by today. http://www.cafemessenger.com/info/info_sweetstuff2.html?ocid=TXT_TAGLM_OctWL... To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
I'm am just setting this up and I am wondering if I can vi/sed the configuration file hobbit-nkview.cfg? I noticed that the file is ordered, so should I also sort the file?
I have to add a lot of systems and the host clone still take some time even if I add multiple hosts at the same time.
Thank you, Tom
I have manually edited the hobbit-nkview.cfg file without problems. Be aware, however, that it is more format-specific than the other hobbit configuration files, and if you typo/etc, it will probably break some functionality of the critical systems feature. I have been careful enough to not make any errors, so I'm not sure exactly what could break. But as long as you are careful, it's fine.
On 10/24/07, Stewart, Tom L. <Tom.Stewart at landsend.com> wrote:
I'm am just setting this up and I am wondering if I can vi/sed the configuration file hobbit-nkview.cfg? I noticed that the file is ordered, so should I also sort the file?
I have to add a lot of systems and the host clone still take some time even if I add multiple hosts at the same time.
Thank you, Tom
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
The way I would work on doing something critical like this is working on a secondary system that is duplicated with it. Anything will work really - if you can head to a flea market or something, even a 4x86 could work =)
On 10/24/07, Gary Baluha <gumby3203 at gmail.com> wrote:
I have manually edited the hobbit-nkview.cfg file without problems. Be aware, however, that it is more format-specific than the other hobbit configuration files, and if you typo/etc, it will probably break some functionality of the critical systems feature. I have been careful enough to not make any errors, so I'm not sure exactly what could break. But as long as you are careful, it's fine.
On 10/24/07, Stewart, Tom L. <Tom.Stewart at landsend.com> wrote:
I'm am just setting this up and I am wondering if I can vi/sed the configuration file hobbit-nkview.cfg? I noticed that the file is ordered, so should I also sort the file?
I have to add a lot of systems and the host clone still take some time even if I add multiple hosts at the same time.
Thank you, Tom
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
On Wed, October 24, 2007 13:44, T.J. Yang wrote:
Isn't a proxy is another SPF (Single point of failure) ?
Well, yup, a single proxy would be. And there are all kinds of options for putting up multiple proxies and setting up high availability/load balancing amongst _those_. This whole thing has kind of transmogrified from its beginnings as a discussion of replication and (possibly manual) failover of a Hobbit server.
Not that there's anything wrong with that.
On 10/24/07, T.J. Yang <tj_yang at hotmail.com> wrote:
Isn't a proxy is another SPF (Single point of failure) ?
Yes, but a proxy doesn't have to be as complicated as a whole Hobbit server. It wouldn't even necessarily have to have disks drives - in a crunch you could probably run a Linux distro off a LiveCD or USB stick, with Squid or Tinyproxy builtin.
Ralph Mitchell
I think the popular method of doing this around here is to use a USB thumb drive and a VM. I'd suggest making a duplicate of this. I'm sure you'll have a spare PC to plug it into already, but I wanted to point it out. Having two of everything on cold swap will definitely help you keep things running.
Josh
On 10/24/07, Ralph Mitchell <ralphmitchell at gmail.com> wrote:
On 10/24/07, T.J. Yang <tj_yang at hotmail.com> wrote:
Isn't a proxy is another SPF (Single point of failure) ?
Yes, but a proxy doesn't have to be as complicated as a whole Hobbit server. It wouldn't even necessarily have to have disks drives - in a crunch you could probably run a Linux distro off a LiveCD or USB stick, with Squid or Tinyproxy builtin.
Ralph Mitchell
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
Will the old BB client report to two servers? Or do I need to upgrade to the hobbit client?
On 10/23/07, Henrik Stoerner <henrik at hswn.dk> wrote:
On Tue, Oct 23, 2007 at 02:18:16PM -0400, Stewart L wrote:
So, how are others doing this? I have a server set up here in my primary data center. We're monitoring a few thousand hosts right now with a large number of custom externals.
I've been tasked with setting up a fail-over or disaster response server in case our primary data center has issues. All of our clients are currently configured to send their messages to the IP address of our primary server.
Now, I could just copy the bb-hosts file to the DR site, but then I would only get the network tests since the clients all report to the primary.
I run two completely separate systems in parallel, and have the clients report to both of them. The system at our disaster center has the paging module disabled (just disable the [bbpage] section in hobbitlaunch.cfg), to avoid double alerts - it is simple to activate it, if necessary.
Config files are rsync'ed from the primary site to the disaster site regularly.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Hi, you might use a virtual IP for hobbit server which could be assigned to another machine if necessary. In case both servers use a common raid, data could be stored on the raid so even history would be there. This is what we do.
Rolf
So, how are others doing this? I have a server set up here in my primary data center. We're monitoring a few thousand hosts right now with a large number of custom externals.
I've been tasked with setting up a fail-over or disaster response server in case our primary data center has issues. All of our clients are currently configured to send their messages to the IP address of our primary server.
Now, I could just copy the bb-hosts file to the DR site, but then I would only get the network tests since the clients all report to the primary.
Would I use bbproxy to do this? But if I install bbproxy on the primary, I won't be proxying the messages if my primary goes down... :(
-- Stewart Larsen
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Mit freundlichen Gruessen Rolf Schrittenlocher
HRZ/BDV, Senckenberganlage 31, 60054 Frankfurt Tel: (49) 69 - 798 28908 Fax: (49) 69 798 28817 LBS: lbs-f at mlist.uni-frankfurt.de Persoenlich: schritte at rz.uni-frankfurt.de
participants (11)
-
gumby3203@gmail.com
-
henrik@hswn.dk
-
hobbit@epperson.homelinux.net
-
josh@imaginenetworksllc.com
-
paulehr@gmail.com
-
ralphmitchell@gmail.com
-
Schrittenlocher@rz.uni-frankfurt.de
-
sclark@nyroc.rr.com
-
stewartl42@gmail.com
-
tj_yang@hotmail.com
-
Tom.Stewart@landsend.com