Fail over?

stewartl42＠gmail.com

23 Oct 2007 23 Oct '07

6:18 p.m.

So, how are others doing this? I have a server set up here in my primary data center. We're monitoring a few thousand hosts right now with a large number of custom externals.

I've been tasked with setting up a fail-over or disaster response server in case our primary data center has issues. All of our clients are currently configured to send their messages to the IP address of our primary server.

Now, I could just copy the bb-hosts file to the DR site, but then I would only get the network tests since the clients all report to the primary.

Would I use bbproxy to do this? But if I install bbproxy on the primary, I won't be proxying the messages if my primary goes down... :(

-- Stewart Larsen

Show replies by date

henrik＠hswn.dk

23 Oct 23 Oct

8:02 p.m.

New subject: [hobbit] Fail over?

On Tue, Oct 23, 2007 at 02:18:16PM -0400, Stewart L wrote:

...

So, how are others doing this? I have a server set up here in my primary data center. We're monitoring a few thousand hosts right now with a large number of custom externals.

I've been tasked with setting up a fail-over or disaster response server in case our primary data center has issues. All of our clients are currently configured to send their messages to the IP address of our primary server.

Now, I could just copy the bb-hosts file to the DR site, but then I would only get the network tests since the clients all report to the primary.

I run two completely separate systems in parallel, and have the clients report to both of them. The system at our disaster center has the paging module disabled (just disable the [bbpage] section in hobbitlaunch.cfg), to avoid double alerts - it is simple to activate it, if necessary.

Config files are rsync'ed from the primary site to the disaster site regularly.

Regards, Henrik

tj_yang＠hotmail.com

24 Oct 24 Oct

10:24 a.m.

New subject: [hobbit] Fail over?

...

Date: Tue, 23 Oct 2007 22:02:34 +0200> From: henrik at hswn.dk> To: hobbit at hswn.dk> Subject: Re: [hobbit] Fail over?> > On Tue, Oct 23, 2007 at 02:18:16PM -0400, Stewart L wrote:> > So, how are others doing this? I have a server set up here in my> > primary data center. We're monitoring a few thousand hosts right now> > with a large number of custom externals.> > > > I've been tasked with setting up a fail-over or disaster response> > server in case our primary data center has issues. All of our clients> > are currently configured to send their messages to the IP address of> > our primary server.> > > > Now, I could just copy the bb-hosts file to the DR site, but then I> > would only get the network tests since the clients all report to the> > primary.> > I run two completely separate systems in parallel, and have the clients> report to both of them. The system at our disaster center has the paging> module disabled (just disable the [bbpage] section in hobbitlaunch.cfg),> to avoid double alerts - it is simple to activate it, if necessary.

I was thinking of using Sun Cluster(hb on Solaris) or HeartBeat(hb on Linux) but then how can I configure the Cluster solution to failover from one site(Florida) to another(NewYork) ?

I believe this setup is the most simple failover solution at the only expense of extra network bandwidth usgage to the secondary hb server.

...

Config files are rsync'ed from the primary site to the disaster site> regularly.> > > Regards,> Henrik> > > To unsubscribe from the hobbit list, send an e-mail to> hobbit-unsubscribe at hswn.dk> >

Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare! http://onecare.live.com/standard/en-us/purchase/trial.aspx?s_cid=wl_hotmailn...

josh＠imaginenetworksllc.com

2:15 p.m.

New subject: [hobbit] Fail over?

I believe you could use something like a proxy (Squid maybe?) for clients to connect to and then use one or the other. I'm not familiar at all with squid itself so I may be completely off, but a load balancer does sound like an option.

On 10/24/07, T.J. Yang <tj_yang at hotmail.com> wrote:

...

...
Date: Tue, 23 Oct 2007 22:02:34 +0200 From: henrik at hswn.dk To: hobbit at hswn.dk Subject: Re: [hobbit] Fail over?

On Tue, Oct 23, 2007 at 02:18:16PM -0400, Stewart L wrote:

...
So, how are others doing this? I have a server set up here in my primary data center. We're monitoring a few thousand hosts right now with a large number of custom externals.

I've been tasked with setting up a fail-over or disaster response server in case our primary data center has issues. All of our clients are currently configured to send their messages to the IP address of our primary server.

Now, I could just copy the bb-hosts file to the DR site, but then I would only get the network tests since the clients all report to the primary.

I run two completely separate systems in parallel, and have the clients report to both of them. The system at our disaster center has the paging module disabled (just disable the [bbpage] section in hobbitlaunch.cfg), to avoid double alerts - it is simple to activate it, if necessary.

I was thinking of using Sun Cluster(hb on Solaris) or HeartBeat(hb on Linux) but then how can I configure the Cluster solution to failover from one site(Florida) to another(NewYork) ?

I believe this setup is the most simple failover solution at the only expense of extra network bandwidth usgage to the secondary hb server.

tj

...
Config files are rsync'ed from the primary site to the disaster site regularly.

Regards, Henrik

To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk

Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare! Try now!<http://onecare.live.com/standard/en-us/purchase/trial.aspx?s_cid=wl_hotmailnews>

-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373

Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer

sclark＠nyroc.rr.com

3:05 p.m.

New subject: hobbitd_channel still crashing everyday

running snapshot 4.3

crashes twice a day, every day, @line 112 in loadhosts file

( sprintf(newp->pagepath, "%s/%s", curtoppage->pagepath, name); ) is the line in question

this is running on Solaris 10 x86, compiled with

./configure.server --rrdinclude /sw/include --rrdlib /sw/lib --pcreinclude /sw/include --pcrelib /sw/lib --sslinclude /sw/include/openssl --ssllib /sw/ssl/lib

don't see why it's crashing at all, any ideas?

Reading hobbitd_channel core file header read successfully Reading ld.so.1 Reading libresolv.so.2 Reading libsocket.so.1 Reading libnsl.so.1 Reading libc.so.1 program terminated by signal ABRT (Abort) 0xfee60717: __lwp_kill+0x0007: jae __lwp_kill+0x15 [ 0xfee60725, .+0xe ] Current function is bbh_item 466 p = strrchr(host->page->pagetitle, '/'); (dbx) where

[1] __lwp_kill(0x1, 0x6), at 0xfee60717 [2] _thr_kill(0x1, 0x6), at 0xfee5ded4 [3] raise(0x6), at 0xfee0ced3 [4] abort(0x8071c20, 0x0, 0x8046758, 0xfee4dd4f, 0x8046758, 0xfee4dd4f), at 0xfedf0969 [5] 0x80581fe(0xb, 0x0, 0x80467f0), at 0x80581fe [6] __sighndlr(0xb, 0x0, 0x80467f0, 0x80581d0), at 0xfee5fadf [7] call_user_handler(0xb, 0x0, 0x80467f0), at 0xfee560d3 [8] sigacthandler(0xb, 0x0, 0x80467f0, 0xf, 0x0, 0x0), at 0xfee56253 ---- called from signal handler with signal 11 (SIGSEGV) ------ =>[9] bbh_item(hostin = 0x80739a8, item = BBH_NET), line 466 in "loadhosts.c" [10] load_hostnames(bbhostsfn = (nil), extrainclude = 0x8046ddc "hobbitd_channel", fqdn = 134508012), line 112 in "loadhosts_file.c"

josh＠imaginenetworksllc.com

3:11 p.m.

New subject: [hobbit] hobbitd_channel still crashing everyday

As you're on Solaris, could you do a dtrace on it?

On 10/24/07, Sean R. Clark <sclark at nyroc.rr.com> wrote:

...

running snapshot 4.3

crashes twice a day, every day, @line 112 in loadhosts file

( sprintf(newp->pagepath, "%s/%s", curtoppage->pagepath, name); ) is the line in question

this is running on Solaris 10 x86, compiled with

./configure.server --rrdinclude /sw/include --rrdlib /sw/lib --pcreinclude /sw/include --pcrelib /sw/lib --sslinclude /sw/include/openssl --ssllib /sw/ssl/lib

don't see why it's crashing at all, any ideas?

Reading hobbitd_channel core file header read successfully Reading ld.so.1 Reading libresolv.so.2 Reading libsocket.so.1 Reading libnsl.so.1 Reading libc.so.1 program terminated by signal ABRT (Abort) 0xfee60717: __lwp_kill+0x0007: jae __lwp_kill+0x15 [ 0xfee60725, .+0xe ] Current function is bbh_item 466 p = strrchr(host->page->pagetitle, '/'); (dbx) where [1] __lwp_kill(0x1, 0x6), at 0xfee60717 [2] _thr_kill(0x1, 0x6), at 0xfee5ded4 [3] raise(0x6), at 0xfee0ced3 [4] abort(0x8071c20, 0x0, 0x8046758, 0xfee4dd4f, 0x8046758, 0xfee4dd4f), at 0xfedf0969 [5] 0x80581fe(0xb, 0x0, 0x80467f0), at 0x80581fe [6] __sighndlr(0xb, 0x0, 0x80467f0, 0x80581d0), at 0xfee5fadf [7] call_user_handler(0xb, 0x0, 0x80467f0), at 0xfee560d3 [8] sigacthandler(0xb, 0x0, 0x80467f0, 0xf, 0x0, 0x0), at 0xfee56253 ---- called from signal handler with signal 11 (SIGSEGV) ------ =>[9] bbh_item(hostin = 0x80739a8, item = BBH_NET), line 466 in " loadhosts.c" [10] load_hostnames(bbhostsfn = (nil), extrainclude = 0x8046ddc "hobbitd_channel", fqdn = 134508012), line 112 in "loadhosts_file.c"

-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373

Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer

henrik＠hswn.dk

25 Oct 25 Oct

8:16 a.m.

New subject: [hobbit] hobbitd_channel still crashing everyday

On Wed, Oct 24, 2007 at 11:05:16AM -0400, Sean R. Clark wrote:

...

[8] sigacthandler(0xb, 0x0, 0x80467f0, 0xf, 0x0, 0x0), at 0xfee56253 ---- called from signal handler with signal 11 (SIGSEGV) ------ =>[9] bbh_item(hostin = 0x80739a8, item = BBH_NET), line 466 in "loadhosts.c" [10] load_hostnames(bbhostsfn = (nil), extrainclude = 0x8046ddc "hobbitd_channel", fqdn = 134508012), line 112 in "loadhosts_file.c"

This trace doesn't make sense - the "bbh_item()" function isn't called from the "load_hostnames()" function. So I think there's some corruption of the stack involved.

Either that, or the binary you're running doesn't match the source code you have (ie. your source files were not used to compile the binary that is running).

If you load the binary and core into gdb as you did to get the stack trace, could you then do this: gdb> fr 10 This should print out that you're now at stackframe #10, which is the "load_hostnames" routine. gdb> p *inbuf gdb> p name gdb> p title These print out the value of a number of variables. gdb> fr 9 gdb> p *hostin

Regards, Henrik

sclark＠nyroc.rr.com

3:42 p.m.

New subject: [hobbit] hobbitd_channel still crashing everyday

Ahh you are correct, my binary + source did not match

Here is the stack trace from the (correct) binary (it's still crashing)

All of them all show

program terminated by signal ABRT (Abort) 0xfee60717: __lwp_kill+0x0007: jae __lwp_kill+0x15 [ 0xfee60725, .+0xe ] Current function is sigsegv_handler 58 abort(); (dbx) where

[1] __lwp_kill(0x1, 0x6), at 0xfee60717 [2] _thr_kill(0x1, 0x6), at 0xfee5ded4 [3] raise(0x6), at 0xfee0ced3 [4] abort(0x8071c20, 0x0, 0x8046758, 0xfee4dd4f, 0x8046758, 0xfee4dd4f), at 0xfedf0969 =>[5] sigsegv_handler(signum = 11), line 58 in "sig.c" [6] __sighndlr(0xb, 0x0, 0x80467f0, 0x80581d0), at 0xfee5fadf [7] call_user_handler(0xb, 0x0, 0x80467f0), at 0xfee560d3 [8] sigacthandler(0xb, 0x0, 0x80467f0, 0xf, 0x0, 0x0), at 0xfee56253 ---- called from signal handler with signal 11 (SIGSEGV) ------ [9] main(argc = 4, argv = 0x8046b28), line 678 in "hobbitd_channel.c"

...

From this:

    /*
     * Try to fork a child to send in an alarm message.
     * If the fork fails, then just attempt to exec() the BB command
     */

Do you have any commands I can run in gdb or dbx to help further?

The name & inbuf are not defined when I try it with the correct binary + core

-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: Thursday, October 25, 2007 4:16 AM To: hobbit at hswn.dk Subject: Re: [hobbit] hobbitd_channel still crashing everyday

On Wed, Oct 24, 2007 at 11:05:16AM -0400, Sean R. Clark wrote:

...

[8] sigacthandler(0xb, 0x0, 0x80467f0, 0xf, 0x0, 0x0), at 0xfee56253 ---- called from signal handler with signal 11 (SIGSEGV) ------ =>[9] bbh_item(hostin = 0x80739a8, item = BBH_NET), line 466 in "loadhosts.c" [10] load_hostnames(bbhostsfn = (nil), extrainclude = 0x8046ddc "hobbitd_channel", fqdn = 134508012), line 112 in "loadhosts_file.c"

This trace doesn't make sense - the "bbh_item()" function isn't called from the "load_hostnames()" function. So I think there's some corruption of the stack involved.

Either that, or the binary you're running doesn't match the source code you have (ie. your source files were not used to compile the binary that is running).

Regards, Henrik

To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk

henrik＠hswn.dk

8:42 p.m.

New subject: [hobbit] hobbitd_channel still crashing everyday

On Thu, Oct 25, 2007 at 11:42:19AM -0400, Sean R. Clark wrote:

...

Ahh you are correct, my binary + source did not match

Here is the stack trace from the (correct) binary (it's still crashing) ---- called from signal handler with signal 11 (SIGSEGV) ------ [9] main(argc = 4, argv = 0x8046b28), line 678 in "hobbitd_channel.c"

Thanks, the line number isn't quite right, but I think this patch should fix it. However, it should only happen if the worker process (hobbitd_alert, hobbitd_rrd, hobbitd_history) cannot keep up with the flow of incoming messages, so there might be a different problem with your setup that triggers this. That would also explain why you see it regularly, and others do not.

Anyway, let me know if this patch stops it from crashing.

Regards, Henrik

sclark＠nyroc.rr.com

28 Oct 28 Oct

10:32 p.m.

New subject: [hobbit] hobbitd_channel still crashing everyday

Just to follow up

Since applying the patch it's been stable

Thanks for the patch

-Sean

-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: Thursday, October 25, 2007 4:42 PM To: hobbit at hswn.dk Subject: Re: [hobbit] hobbitd_channel still crashing everyday

On Thu, Oct 25, 2007 at 11:42:19AM -0400, Sean R. Clark wrote:

...

Ahh you are correct, my binary + source did not match

Here is the stack trace from the (correct) binary (it's still crashing) ---- called from signal handler with signal 11 (SIGSEGV) ------ [9] main(argc = 4, argv = 0x8046b28), line 678 in "hobbitd_channel.c"

Anyway, let me know if this patch stops it from crashing.

Regards, Henrik

paulehr＠gmail.com

24 Oct 24 Oct

4:23 p.m.

New subject: [hobbit] Fail over?

That sounds like an interesting idea to use squid to load balance between the two servers. I need to do something similar in our lab and been trying to figure out the best way to do it.

On 10/24/07, Josh Luthman <josh at imaginenetworksllc.com> wrote:

...

I believe you could use something like a proxy (Squid maybe?) for clients to connect to and then use one or the other. I'm not familiar at all with squid itself so I may be completely off, but a load balancer does sound like an option.

On 10/24/07, T.J. Yang <tj_yang at hotmail.com> wrote:

...
...
Date: Tue, 23 Oct 2007 22:02:34 +0200 From: henrik at hswn.dk To: hobbit at hswn.dk Subject: Re: [hobbit] Fail over?

On Tue, Oct 23, 2007 at 02:18:16PM -0400, Stewart L wrote:

...
So, how are others doing this? I have a server set up here in my primary data center. We're monitoring a few thousand hosts right now with a large number of custom externals.

I've been tasked with setting up a fail-over or disaster response server in case our primary data center has issues. All of our clients are currently configured to send their messages to the IP address of our primary server.

Now, I could just copy the bb-hosts file to the DR site, but then I would only get the network tests since the clients all report to the primary.

I run two completely separate systems in parallel, and have the clients report to both of them. The system at our disaster center has the paging module disabled (just disable the [bbpage] section in hobbitlaunch.cfg ), to avoid double alerts - it is simple to activate it, if necessary.

I was thinking of using Sun Cluster(hb on Solaris) or HeartBeat(hb on Linux) but then how can I configure the Cluster solution to failover from one site(Florida) to another(NewYork) ?

I believe this setup is the most simple failover solution at the only expense of extra network bandwidth usgage to the secondary hb server.

tj

...
Config files are rsync'ed from the primary site to the disaster site regularly.

Regards, Henrik

To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk

Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare! Try now!<http://onecare.live.com/standard/en-us/purchase/trial.aspx?s_cid=wl_hotmailnews>

-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373

Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer

tj_yang＠hotmail.com

5:44 p.m.

New subject: [hobbit] Fail over?

Isn't a proxy is another SPF (Single point of failure) ?

T.J. Yang

Date: Wed, 24 Oct 2007 12:23:20 -0400 From: paulehr at gmail.com To: hobbit at hswn.dk Subject: Re: [hobbit] Fail over?

That sounds like an interesting idea to use squid to load balance between the two servers. I need to do something similar in our lab and been trying to figure out the best way to do it.

On 10/24/07, Josh Luthman <josh at imaginenetworksllc.com> wrote: I believe you could use something like a proxy (Squid maybe?) for clients to connect to and then use one or the other. I'm not familiar at all with squid itself so I may be completely off, but a load balancer does sound like an option.

On 10/24/07, T.J. Yang < tj_yang at hotmail.com> wrote:

...

Date: Tue, 23 Oct 2007 22:02:34 +0200 From: henrik at hswn.dk To: hobbit at hswn.dk Subject: Re: [hobbit] Fail over?

On Tue, Oct 23, 2007 at 02:18:16PM -0400, Stewart L wrote:

...
So, how are others doing this? I have a server set up here in my primary data center. We're monitoring a few thousand hosts right now with a large number of custom externals.

I've been tasked with setting up a fail-over or disaster response server in case our primary data center has issues. All of our clients are currently configured to send their messages to the IP address of our primary server.

Now, I could just copy the bb-hosts file to the DR site, but then I would only get the network tests since the clients all report to the primary.

I run two completely separate systems in parallel, and have the clients report to both of them. The system at our disaster center has the paging module disabled (just disable the [bbpage] section in hobbitlaunch.cfg), to avoid double alerts - it is simple to activate it, if necessary.

I was thinking of using Sun Cluster(hb on Solaris) or HeartBeat(hb on Linux) but then how can I configure the Cluster solution to failover from one site(Florida) to another(NewYork) ?

I believe this setup is the most simple failover solution at the only expense of extra network bandwidth usgage to the secondary hb server.

...

Config files are rsync'ed from the primary site to the disaster site regularly.

Regards, Henrik

To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk

Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare! Try now!

-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373

Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer

Help yourself to FREE treats served up daily at the Messenger Café. Stop by today. http://www.cafemessenger.com/info/info_sweetstuff2.html?ocid=TXT_TAGLM_OctWL...

stewartl42＠gmail.com

5:58 p.m.

New subject: [hobbit] Fail over?

yes, it is.

I've spoke with our infrastructure and support team and they will be re-configuring the client on all of their machines to point to both servers. We already have the DR server up and running, I just need to script the daily copy of the config files.

Thanks for all the input, folks!

Stewart

On 10/24/07, T.J. Yang <tj_yang at hotmail.com> wrote:

...

Isn't a proxy is another SPF (Single point of failure) ?

T.J. Yang

Date: Wed, 24 Oct 2007 12:23:20 -0400 From: paulehr at gmail.com To: hobbit at hswn.dk Subject: Re: [hobbit] Fail over?

That sounds like an interesting idea to use squid to load balance between the two servers. I need to do something similar in our lab and been trying to figure out the best way to do it.

On 10/24/07, Josh Luthman <josh at imaginenetworksllc.com> wrote: I believe you could use something like a proxy (Squid maybe?) for clients to connect to and then use one or the other. I'm not familiar at all with squid itself so I may be completely off, but a load balancer does sound like an option.

On 10/24/07, T.J. Yang < tj_yang at hotmail.com> wrote:

...
Date: Tue, 23 Oct 2007 22:02:34 +0200 From: henrik at hswn.dk To: hobbit at hswn.dk Subject: Re: [hobbit] Fail over?

On Tue, Oct 23, 2007 at 02:18:16PM -0400, Stewart L wrote:

...
So, how are others doing this? I have a server set up here in my primary data center. We're monitoring a few thousand hosts right now with a large number of custom externals.

I've been tasked with setting up a fail-over or disaster response server in case our primary data center has issues. All of our clients are currently configured to send their messages to the IP address of our primary server.

Now, I could just copy the bb-hosts file to the DR site, but then I would only get the network tests since the clients all report to the primary.

I run two completely separate systems in parallel, and have the clients report to both of them. The system at our disaster center has the paging module disabled (just disable the [bbpage] section in hobbitlaunch.cfg), to avoid double alerts - it is simple to activate it, if necessary.

I was thinking of using Sun Cluster(hb on Solaris) or HeartBeat(hb on Linux) but then how can I configure the Cluster solution to failover from one site(Florida) to another(NewYork) ?

I believe this setup is the most simple failover solution at the only expense of extra network bandwidth usgage to the secondary hb server.

tj

...
Config files are rsync'ed from the primary site to the disaster site regularly.

Regards, Henrik

To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk

Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare! Try now!

-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373

Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer

Help yourself to FREE treats served up daily at the Messenger Café. Stop by today. http://www.cafemessenger.com/info/info_sweetstuff2.html?ocid=TXT_TAGLM_OctWL... To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk

Tom.Stewart＠landsend.com

8:01 p.m.

New subject: Hobbit-nkview.cfg question

I'm am just setting this up and I am wondering if I can vi/sed the configuration file hobbit-nkview.cfg? I noticed that the file is ordered, so should I also sort the file?

I have to add a lot of systems and the host clone still take some time even if I add multiple hosts at the same time.

Thank you, Tom

gumby3203＠gmail.com

8:38 p.m.

New subject: [hobbit] Hobbit-nkview.cfg question

I have manually edited the hobbit-nkview.cfg file without problems. Be aware, however, that it is more format-specific than the other hobbit configuration files, and if you typo/etc, it will probably break some functionality of the critical systems feature. I have been careful enough to not make any errors, so I'm not sure exactly what could break. But as long as you are careful, it's fine.

On 10/24/07, Stewart, Tom L. <Tom.Stewart at landsend.com> wrote:

...

I'm am just setting this up and I am wondering if I can vi/sed the configuration file hobbit-nkview.cfg? I noticed that the file is ordered, so should I also sort the file?

I have to add a lot of systems and the host clone still take some time even if I add multiple hosts at the same time.

Thank you, Tom

To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk

josh＠imaginenetworksllc.com

8:49 p.m.

New subject: [hobbit] Hobbit-nkview.cfg question

The way I would work on doing something critical like this is working on a secondary system that is duplicated with it. Anything will work really - if you can head to a flea market or something, even a 4x86 could work =)

On 10/24/07, Gary Baluha <gumby3203 at gmail.com> wrote:

...

I have manually edited the hobbit-nkview.cfg file without problems. Be aware, however, that it is more format-specific than the other hobbit configuration files, and if you typo/etc, it will probably break some functionality of the critical systems feature. I have been careful enough to not make any errors, so I'm not sure exactly what could break. But as long as you are careful, it's fine.

On 10/24/07, Stewart, Tom L. <Tom.Stewart at landsend.com> wrote:

...
I'm am just setting this up and I am wondering if I can vi/sed the configuration file hobbit-nkview.cfg? I noticed that the file is ordered, so should I also sort the file?

I have to add a lot of systems and the host clone still take some time even if I add multiple hosts at the same time.

Thank you, Tom

To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk

-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373

Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer

hobbit＠epperson.homelinux.net

5:59 p.m.

New subject: [hobbit] Fail over?

On Wed, October 24, 2007 13:44, T.J. Yang wrote:

...

Isn't a proxy is another SPF (Single point of failure) ?

Well, yup, a single proxy would be. And there are all kinds of options for putting up multiple proxies and setting up high availability/load balancing amongst _those_. This whole thing has kind of transmogrified from its beginnings as a discussion of replication and (possibly manual) failover of a Hobbit server.

Not that there's anything wrong with that.

ralphmitchell＠gmail.com

6:02 p.m.

New subject: [hobbit] Fail over?

On 10/24/07, T.J. Yang <tj_yang at hotmail.com> wrote:

...

Isn't a proxy is another SPF (Single point of failure) ?

Yes, but a proxy doesn't have to be as complicated as a whole Hobbit server. It wouldn't even necessarily have to have disks drives - in a crunch you could probably run a Linux distro off a LiveCD or USB stick, with Squid or Tinyproxy builtin.

Ralph Mitchell

josh＠imaginenetworksllc.com

6:09 p.m.

New subject: [hobbit] Fail over?

I think the popular method of doing this around here is to use a USB thumb drive and a VM. I'd suggest making a duplicate of this. I'm sure you'll have a spare PC to plug it into already, but I wanted to point it out. Having two of everything on cold swap will definitely help you keep things running.

Josh

On 10/24/07, Ralph Mitchell <ralphmitchell at gmail.com> wrote:

...

On 10/24/07, T.J. Yang <tj_yang at hotmail.com> wrote:

...
Isn't a proxy is another SPF (Single point of failure) ?

Yes, but a proxy doesn't have to be as complicated as a whole Hobbit server. It wouldn't even necessarily have to have disks drives - in a crunch you could probably run a Linux distro off a LiveCD or USB stick, with Squid or Tinyproxy builtin.

Ralph Mitchell

To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk

-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373

Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer

stewartl42＠gmail.com

5:10 p.m.

New subject: [hobbit] Fail over?

Will the old BB client report to two servers? Or do I need to upgrade to the hobbit client?

On 10/23/07, Henrik Stoerner <henrik at hswn.dk> wrote:

...

On Tue, Oct 23, 2007 at 02:18:16PM -0400, Stewart L wrote:

...
So, how are others doing this? I have a server set up here in my primary data center. We're monitoring a few thousand hosts right now with a large number of custom externals.

I've been tasked with setting up a fail-over or disaster response server in case our primary data center has issues. All of our clients are currently configured to send their messages to the IP address of our primary server.

Now, I could just copy the bb-hosts file to the DR site, but then I would only get the network tests since the clients all report to the primary.

I run two completely separate systems in parallel, and have the clients report to both of them. The system at our disaster center has the paging module disabled (just disable the [bbpage] section in hobbitlaunch.cfg), to avoid double alerts - it is simple to activate it, if necessary.

Config files are rsync'ed from the primary site to the disaster site regularly.

Regards, Henrik

To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk

Schrittenlocher＠rz.uni-frankfurt.de

5:47 a.m.

New subject: [hobbit] Fail over?

Hi, you might use a virtual IP for hobbit server which could be assigned to another machine if necessary. In case both servers use a common raid, data could be stored on the raid so even history would be there. This is what we do.

Rolf

...

So, how are others doing this? I have a server set up here in my primary data center. We're monitoring a few thousand hosts right now with a large number of custom externals.

I've been tasked with setting up a fail-over or disaster response server in case our primary data center has issues. All of our clients are currently configured to send their messages to the IP address of our primary server.

Now, I could just copy the bb-hosts file to the DR site, but then I would only get the network tests since the clients all report to the primary.

Would I use bbproxy to do this? But if I install bbproxy on the primary, I won't be proxying the messages if my primary goes down... :(

-- Stewart Larsen

To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk

-- Mit freundlichen Gruessen Rolf Schrittenlocher

HRZ/BDV, Senckenberganlage 31, 60054 Frankfurt Tel: (49) 69 - 798 28908 Fax: (49) 69 798 28817 LBS: lbs-f at mlist.uni-frankfurt.de Persoenlich: schritte at rz.uni-frankfurt.de

6814

Age (days ago)

6819

Last active (days ago)

List overview

Download

20 comments

11 participants

participants (11)

gumby3203＠gmail.com
henrik＠hswn.dk
hobbit＠epperson.homelinux.net
josh＠imaginenetworksllc.com
paulehr＠gmail.com
ralphmitchell＠gmail.com
Schrittenlocher＠rz.uni-frankfurt.de
sclark＠nyroc.rr.com
stewartl42＠gmail.com
tj_yang＠hotmail.com
Tom.Stewart＠landsend.com

Fail over?

stewartl42＠gmail.com

tj_yang＠hotmail.com

sclark＠nyroc.rr.com

sclark＠nyroc.rr.com

sclark＠nyroc.rr.com

paulehr＠gmail.com

tj_yang＠hotmail.com

stewartl42＠gmail.com

Tom.Stewart＠landsend.com

hobbit＠epperson.homelinux.net

stewartl42＠gmail.com

Schrittenlocher＠rz.uni-frankfurt.de

tags

participants (11)