Hi all,
I am redesigning the method we use for performing a failover to a disaster recovery installation of hobbit. I am interested in opinions on the approach and any shortcomings.
Note: This is not HA/clustering, it is for DR purposes.
We are aiming to have:
a production hobbit deployment a DR hobbit deployment
clients will be configured to send metrics to both servers. which will keep historical rrd data up to date etc.
The production server will be configured to send out alerts. The dr server will not.
At regular intervals, rsync will be used to synchronise data from the production server to the dr server, including the in memory checkpoint file.
In the event of a dr, the dr hobbit server will be promoted to active by restarting hobbit, and loading the checkpoint and alert configurations.
I am expecting that this will ensure that the dr server will be "up to date" with proudction as per the last checkpoint. This includes tests that have been disabled or acknowledged.
Prior to failback to the production hobbit installation, the reverse of the above would be performed. An rsync of rrd data files would be performed to cover any windows where one of the servers was offline for a period of time.
Is there anything wrong with this approach?
Cheers
Phil
-- Tel: 0400 466 952 Fax: 0433 123 226 email: philwild AT gmail.com
Hi Phil,
I know you said you don't want to use a HA/clustering solution, but I have a similar situation to yourself and I use a HA solution with heartbeat/drbd and being honest it saves me a load of hassle. OK the failover fails automatically and I don't know that it has (which I'd argue is how I want it) but all the rrd files are kept in sync and all maintenance settings get maintained across the two servers. Plus I don't need to recall which server was down and which server I need to rsync from and to - DRBD resource maintains all that for me and I just worry about configuring hobbit. Plus as hobbit is only running on the active server, it's the only one sending out alerts.
I can give you more details on my configuration if you are interested.
Cheers,
Iain.
2008/5/19 Phil Wild <philwild at gmail.com>:
Hi all,
I am redesigning the method we use for performing a failover to a disaster recovery installation of hobbit. I am interested in opinions on the approach and any shortcomings.
Note: This is not HA/clustering, it is for DR purposes.
We are aiming to have:
a production hobbit deployment a DR hobbit deployment
clients will be configured to send metrics to both servers. which will keep historical rrd data up to date etc.
The production server will be configured to send out alerts. The dr server will not.
At regular intervals, rsync will be used to synchronise data from the production server to the dr server, including the in memory checkpoint file.
In the event of a dr, the dr hobbit server will be promoted to active by restarting hobbit, and loading the checkpoint and alert configurations.
I am expecting that this will ensure that the dr server will be "up to date" with proudction as per the last checkpoint. This includes tests that have been disabled or acknowledged.
Prior to failback to the production hobbit installation, the reverse of the above would be performed. An rsync of rrd data files would be performed to cover any windows where one of the servers was offline for a period of time.
Is there anything wrong with this approach?
Cheers
Phil
-- Tel: 0400 466 952 Fax: 0433 123 226 email: philwild AT gmail.com
-- Iain Miller iainonthemove at gmail.com
Hi Ian,
I have contemplated this approach...
The two hobbit installations are about 15km's apart. Although I could cluster between two servers over this distance, it is not my preference.
My biggest issue with a clustered solution is that there is one copy of the data (albeit usually mirrored). If something goes wrong with the data (e.g. a mistake.. rm bb-hosts...), it instantly happens to both sites. Recovery then requires restoration via tape/snapshot etc. If it is via snapshot, then I have to roll back to the last snapshot (which may be acceptable depending on the technology being used).
I want my dr copy to be as close to production as possible but without any shared infrastructure that may allow the poisoning of both services. A warm standby seems to be a good approach and from my research, it seems quite feasible.
Cheers
Phil
2008/5/19 Iain Miller <iainonthemove at gmail.com>:
Hi Phil,
I know you said you don't want to use a HA/clustering solution, but I have a similar situation to yourself and I use a HA solution with heartbeat/drbd and being honest it saves me a load of hassle. OK the failover fails automatically and I don't know that it has (which I'd argue is how I want it) but all the rrd files are kept in sync and all maintenance settings get maintained across the two servers. Plus I don't need to recall which server was down and which server I need to rsync from and to - DRBD resource maintains all that for me and I just worry about configuring hobbit. Plus as hobbit is only running on the active server, it's the only one sending out alerts.
I can give you more details on my configuration if you are interested.
Cheers,
Iain.
2008/5/19 Phil Wild <philwild at gmail.com>:
Hi all,
I am redesigning the method we use for performing a failover to a disaster recovery installation of hobbit. I am interested in opinions on the approach and any shortcomings.
Note: This is not HA/clustering, it is for DR purposes.
We are aiming to have:
a production hobbit deployment a DR hobbit deployment
clients will be configured to send metrics to both servers. which will keep historical rrd data up to date etc.
The production server will be configured to send out alerts. The dr server will not.
At regular intervals, rsync will be used to synchronise data from the production server to the dr server, including the in memory checkpoint file.
In the event of a dr, the dr hobbit server will be promoted to active by restarting hobbit, and loading the checkpoint and alert configurations.
I am expecting that this will ensure that the dr server will be "up to date" with proudction as per the last checkpoint. This includes tests that have been disabled or acknowledged.
Prior to failback to the production hobbit installation, the reverse of the above would be performed. An rsync of rrd data files would be performed to cover any windows where one of the servers was offline for a period of time.
Is there anything wrong with this approach?
Cheers
Phil
-- Tel: 0400 466 952 Fax: 0433 123 226 email: philwild AT gmail.com
-- Iain Miller iainonthemove at gmail.com
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Tel: 0400 466 952 Fax: 0433 123 226 email: philwild AT gmail.com
I'd like to hear about your setup...
On Mon, May 19, 2008 at 4:03 AM, Iain Miller <iainonthemove at gmail.com> wrote:
Hi Phil,
I know you said you don't want to use a HA/clustering solution, but I have a similar situation to yourself and I use a HA solution with heartbeat/drbd and being honest it saves me a load of hassle. OK the failover fails automatically and I don't know that it has (which I'd argue is how I want it) but all the rrd files are kept in sync and all maintenance settings get maintained across the two servers. Plus I don't need to recall which server was down and which server I need to rsync from and to - DRBD resource maintains all that for me and I just worry about configuring hobbit. Plus as hobbit is only running on the active server, it's the only one sending out alerts.
I can give you more details on my configuration if you are interested.
Cheers,
Iain.
2008/5/19 Phil Wild <philwild at gmail.com>:
Hi all,
I am redesigning the method we use for performing a failover to a disaster recovery installation of hobbit. I am interested in opinions on the approach and any shortcomings.
Note: This is not HA/clustering, it is for DR purposes.
We are aiming to have:
a production hobbit deployment a DR hobbit deployment
clients will be configured to send metrics to both servers. which will keep historical rrd data up to date etc.
The production server will be configured to send out alerts. The dr server will not.
At regular intervals, rsync will be used to synchronise data from the production server to the dr server, including the in memory checkpoint file.
In the event of a dr, the dr hobbit server will be promoted to active by restarting hobbit, and loading the checkpoint and alert configurations.
I am expecting that this will ensure that the dr server will be "up to date" with proudction as per the last checkpoint. This includes tests that have been disabled or acknowledged.
Prior to failback to the production hobbit installation, the reverse of the above would be performed. An rsync of rrd data files would be performed to cover any windows where one of the servers was offline for a period of time.
Is there anything wrong with this approach?
Cheers
Phil
-- Tel: 0400 466 952 Fax: 0433 123 226 email: philwild AT gmail.com
-- Iain Miller iainonthemove at gmail.com
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Stewart
The revolution will not be televised. The revolution will be no re-run brothers; The revolution will be live.
On mán, 2008-05-19 at 09:03 +0100, Iain Miller wrote:
but all the rrd files are kept in sync and all maintenance settings get maintained across the two servers.
I would say that keeping the rrd files in sync, is the touchest task.
How do you mix two rrd files which both have some different holes in their information?
I am using a solution where the idea is to eliminate single points of failure. All clients send data to both hobbit servers, and both servers store rrd data and both servers have http service.
Eliminating single points of failure is a simple method which gives an uptime sufficient for my university, without resorting to more advanced methods.
-- Kindest Regards, Anna Jonna Ármannsdóttir, %& A: Because people read from top to bottom. Unix System Aministration, Computing Services, %& Q: Why is top posting bad? University of Iceland.
Re: Phil Wild 2008-05-19 <258e9b160805182339r6090664ax64681ff20a68821d at mail.gmail.com> [...]
clients will be configured to send metrics to both servers. which will keep historical rrd data up to date etc.
Hi,
just a note on that one, we are running a setup like that too, plus a third host for alerting. The problem we have is that if one of the hosts in BBDISPLAYS is down, clients will have trouble sending in data to any server, because of timeouts. Sometimes notifications lag for as much as half an hour.
I'm not saying your setup won't work, but you should keep in mind that you might effectively degrade availability/responsiveness because of the use of multiple servers.
Christoph
cb at df7cb.de | http://www.df7cb.de/
On mán, 2008-05-19 at 12:44 +0200, Christoph Berg wrote:
clients will be configured to send metrics to both servers. which will keep historical rrd data up to date etc.
Hi,
just a note on that one, we are running a setup like that too, plus a third host for alerting. The problem we have is that if one of the hosts in BBDISPLAYS is down, clients will have trouble sending in data to any server, because of timeouts. Sometimes notifications lag for as much as half an hour.
So the clients contact the servers one after another. The clients could be changed so that they contact the the servers in parallel. So if one of the processes times out, it will not affect the other processes.
This behaviour probably depends on the client itself. BBWin, Big Brother Client and Hobbit Client probably behave differently.
-- Kindest Regards, Anna Jonna Ármannsdóttir, %& A: Because people read from top to bottom. Unix System Aministration, Computing Services, %& Q: Why is top posting bad? University of Iceland.
Overcome this by checking if BBDISPLAY server is down, then editing client config file, bouncing hobbit service.
No delays/responsiveness issues.
Much the same as hobbit invokes an email or SNPP event that notifies user if up or down, one could notify clients that a BBDISPLAY is down/up and act accordingly.
Best,
PKrash
-----Original Message----- From: Christoph Berg [mailto:cb at df7cb.de] Sent: Mon 5/19/2008 5:44 AM To: hobbit at hswn.dk Subject: Re: [hobbit] dr for hobbit
Re: Phil Wild 2008-05-19 <258e9b160805182339r6090664ax64681ff20a68821d at mail.gmail.com> [...]
clients will be configured to send metrics to both servers. which will keep historical rrd data up to date etc.
Hi,
just a note on that one, we are running a setup like that too, plus a third host for alerting. The problem we have is that if one of the hosts in BBDISPLAYS is down, clients will have trouble sending in data to any server, because of timeouts. Sometimes notifications lag for as much as half an hour.
I'm not saying your setup won't work, but you should keep in mind that you might effectively degrade availability/responsiveness because of the use of multiple servers.
Christoph
cb at df7cb.de | http://www.df7cb.de/
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
This e-mail and any documents accompanying it may contain legally privileged and/or confidential information belonging to Exegy, Inc. Such information may be protected from disclosure by law. The information is intended for use by only the addressee. If you are not the intended recipient, you are hereby notified that any disclosure or use of the information is strictly prohibited. If you have received this e-mail in error, please immediately contact the sender by e-mail or phone regarding instructions for return or destruction and do not use or disclose the content to others.
participants (6)
-
annaj@hi.is
-
cb@df7cb.de
-
iainonthemove@gmail.com
-
philwild@gmail.com
-
pkrash@exegy.com
-
stewartl42@gmail.com