HA Hobbit / Load Balanced Hobbit / Failover Hobbit

25 May 2007

      I've been thinking of implementing a fault tolerant Hobbit setup. I've
got a few ideas on how to do it, so I thought I would throw them out to
see what Henrik and anyone else thinks...for all I know someone already
has it setup*.
Setup 1 - Using bbproxy:
Clients configured to send to DNS name "hobbit.mydomain.com"
HOST-A, aka "hobbit.mydomain.com"

Running bbproxy, configured to send all received data to HOST-B
(hobbit1) and HOST-C (hobbit2)
Apache virtual host redirects HTTP requests for hobbit.mydomain.com to
HOST-B (hobbit1)

HOST-B aka "hobbit1". This is the main Hobbit server.

Configured to do network tests
Configured to send alerts
Configured to save checkpoint file every 5 minutes (maybe even every
minute)
Also monitors the bbproxy host and alerts if it goes down
Also monitors HOST-C (hobbit2) and alerts if it goes down
Web interface accessible via a DNS virtual host
(monitor.mydomain.com/hobbit)

HOST-C aka "hobbit2". This is the secondary/standby server.

Only does network tests and alerts of HOST-B
Rsync process runs every minute and mirrors HOST-B config files
(bb-hosts, hobbit-alerts, the checkpoint file, etc) to a failover directory.
Failover is accomplished via SCRIPT directive in hobbit-alerts.cfg
When HOST-B goes down, an alert is sent, and also an addition SCRIPT
directive kicks off a script which will:

Swap out HOST-C's config files (bb-hosts, hobbit-alerts.cfg,
checkpoint file, etc) with the saved mirrored failover ones
Restart Hobbit to load the new checkpoint data
Update the apache config on HOST-A so that the web interface now
redirects to HOST-C instead of HOST-B, and rehup apache to activate the
new config.
Host-C is now performing everything that HOST-B was. It is doing the
same network tests, has the same alert setup, and is receiving the same
data from bbproxy.  Anyone who goes to
http://monitor.mydomain.com/hobbit will correctly be directed to the
secondary server.

Thoughts:
This is an automated failover, but failing back to the primary server
when it comes back up is not so easily done. Eh, I suppose with another
SCRIPT directive it could be done, but usually when a host fails its not
ready to return to full service the first time it boots back up, so the
failback is probably best done manually. Failback would include rsyncing
the data directory and checkpoint file back to the primary host, and
reverting HOST-C and HOST-A (apache) back to their pre-fail configurations.
I suppose this could be done without bb-proxy, and just have the
secondary server constantly rsync the first server (data and etc
directory), but then you would lose the ability to have the SCRIPT
directive do the failover stuff for you (hmm or maybe not, if it was in
HOST-B's config, it would never trigger, so should be safe to mirror
that config). It's still nice to use bbproxy though, because it
rate-limits the incoming messages, as well as combines them to combo
messages when it is able, which reduces the load on the Hobbit server.
This only provides failover, no load balancing. This could change if
Henrik ever does the code changes to allow distributing worker modules.

I discovered while searching that someone has created a patch(es) for
rrdtool that allows it to store and retrieve data from MySQL or Postgres
instead of/in addition to RRD files. This is very interesting as it
would be transparent to Hobbit, and MySQL has excellent load balancing
and replication built-in. The author claimed that he had "re-written Big
Brother" using these rrdtool patches to create a load balanced fault
tolerant monitoring cluster. I wonder if he hacked it so that all the
configuration was stored in mysql as well...I guess no matter as I got
the impression that he didn't want to share his work (except for the
rrdtool patches) :-)

jonescr＠cisco.com

tags

participants (1)