HA Hobbit / Load Balanced Hobbit / Failover Hobbit
I've been thinking of implementing a fault tolerant Hobbit setup. I've got a few ideas on how to do it, so I thought I would throw them out to see what Henrik and anyone else thinks...for all I know someone already has it setup*.
Setup 1 - Using bbproxy: Clients configured to send to DNS name "hobbit.mydomain.com" HOST-A, aka "hobbit.mydomain.com"
- Running bbproxy, configured to send all received data to HOST-B (hobbit1) and HOST-C (hobbit2)
- Apache virtual host redirects HTTP requests for hobbit.mydomain.com to HOST-B (hobbit1)
HOST-B aka "hobbit1". This is the main Hobbit server.
- Configured to do network tests
- Configured to send alerts
- Configured to save checkpoint file every 5 minutes (maybe even every minute)
- Also monitors the bbproxy host and alerts if it goes down
- Also monitors HOST-C (hobbit2) and alerts if it goes down
- Web interface accessible via a DNS virtual host (monitor.mydomain.com/hobbit)
HOST-C aka "hobbit2". This is the secondary/standby server.
- Only does network tests and alerts of HOST-B
- Rsync process runs every minute and mirrors HOST-B config files (bb-hosts, hobbit-alerts, the checkpoint file, etc) to a failover directory.
- Failover is accomplished via SCRIPT directive in hobbit-alerts.cfg
- When HOST-B goes down, an alert is sent, and also an addition SCRIPT directive kicks off a script which will:
- Swap out HOST-C's config files (bb-hosts, hobbit-alerts.cfg, checkpoint file, etc) with the saved mirrored failover ones
- Restart Hobbit to load the new checkpoint data
- Update the apache config on HOST-A so that the web interface now redirects to HOST-C instead of HOST-B, and rehup apache to activate the new config.
- Host-C is now performing everything that HOST-B was. It is doing the same network tests, has the same alert setup, and is receiving the same data from bbproxy. Anyone who goes to http://monitor.mydomain.com/hobbit will correctly be directed to the secondary server.
Thoughts: This is an automated failover, but failing back to the primary server when it comes back up is not so easily done. Eh, I suppose with another SCRIPT directive it could be done, but usually when a host fails its not ready to return to full service the first time it boots back up, so the failback is probably best done manually. Failback would include rsyncing the data directory and checkpoint file back to the primary host, and reverting HOST-C and HOST-A (apache) back to their pre-fail configurations.
I suppose this could be done without bb-proxy, and just have the secondary server constantly rsync the first server (data and etc directory), but then you would lose the ability to have the SCRIPT directive do the failover stuff for you (hmm or maybe not, if it was in HOST-B's config, it would never trigger, so should be safe to mirror that config). It's still nice to use bbproxy though, because it rate-limits the incoming messages, as well as combines them to combo messages when it is able, which reduces the load on the Hobbit server.
This only provides failover, no load balancing. This could change if Henrik ever does the code changes to allow distributing worker modules.
- I discovered while searching that someone has created a patch(es) for rrdtool that allows it to store and retrieve data from MySQL or Postgres instead of/in addition to RRD files. This is very interesting as it would be transparent to Hobbit, and MySQL has excellent load balancing and replication built-in. The author claimed that he had "re-written Big Brother" using these rrdtool patches to create a load balanced fault tolerant monitoring cluster. I wonder if he hacked it so that all the configuration was stored in mysql as well...I guess no matter as I got the impression that he didn't want to share his work (except for the rrdtool patches) :-)
participants (1)
-
jonescr@cisco.com