xymon master-slave server
On Tue, April 5, 2016 10:59 am, eli wrote:
I am planning to build secondary xymon server as backup, is there good method to sync between them and not both of them not to send alert same time. if any one implement already I would like to hear feedback. thanks,Eli
There are a few different strategies one can use here, all depending on what kind of internal SLA you're expecting, how much bandwidth you're able to use, and whether you need identical systems or not.
The simplest solution (but most bandwidth intensive) is to run the two servers as stacks in parallel and simply not alert on the secondary one. You can use Linux-HA or any of the more advanced cluster software to determine up/down status between the two boxes and take over when the other isn't reachable. You can configure your clients to send reports to both xymon servers at the same time ($XYMONSERVERS) and you've in effect got two complete systems. (xymond_distribute can be used to pass disable/enable messages over). The drawback is a) double the bandwidth use, and b) losing your acknowledgements and alert suppression when the failover occurs.
Alternatively, you can keep the second server on a cold standby, regularly getting rsync's from the primary one of the checkpoint files for both xymond and xymond_alert. This has the advantage of the secondary system not being in use when it doesn't need to be. When failover happens, you start up xymond on the slave, it reads from the last checkpoint you'd gotten (I'd advise increasing frequency to something like every few mins, depending on your needs) and starts from there. The drawback there is that you don't have graphing/history at all, and you're missing the last few minutes of changes.
If you have heavy network bw available to you, things like DRBD can be used to perform a complete synchronization of *saved* data between the servers.
Henrik had proposed a Xymon Swarm concept at http://lists.xymon.com/pipermail/xymon/2015-November/042684.html , which may also help you evaluate your site's needs.
Really, there are lots of different ways to conceptualize "high availability" for your monitoring system... I'd advise to keep things as simple as possible so as to eliminate failure points. In our case, we've had two live stacks running in parallel that (mostly) submit into a ticket system, which can de-dupe incoming host+svc alerts automatically, which mostly defines the problem out of existence. The things directly emailed to us were of lower frequency, so we were fine with duplicate emails at first. When that got to be too annoying, we left xymond_alert enabled on the second system but used Linux-HA to simply disable Postfix when it wasn't master. When a failover occurred, the startup script was modified to clear out the local outbound queue before starting the service up. Xymon thus never had to care about whether it was primary/secondary at all.
Hope that helps a little bit.
Regards, -jc
Thanks much J.C for detail information.
Sent from Yahoo Mail on Android
On Wed, Apr 6, 2016 at 16:04, J.C. Cleaver<cleaver at terabithia.org> wrote:
On Tue, April 5, 2016 10:59 am, eli wrote:
I am planning to build secondary xymon server as backup, is there good method to sync between them and not both of them not to send alert same time. if any one implement already I would like to hear feedback. thanks,Eli
There are a few different strategies one can use here, all depending on what kind of internal SLA you're expecting, how much bandwidth you're able to use, and whether you need identical systems or not.
The simplest solution (but most bandwidth intensive) is to run the two servers as stacks in parallel and simply not alert on the secondary one. You can use Linux-HA or any of the more advanced cluster software to determine up/down status between the two boxes and take over when the other isn't reachable. You can configure your clients to send reports to both xymon servers at the same time ($XYMONSERVERS) and you've in effect got two complete systems. (xymond_distribute can be used to pass disable/enable messages over). The drawback is a) double the bandwidth use, and b) losing your acknowledgements and alert suppression when the failover occurs.
Alternatively, you can keep the second server on a cold standby, regularly getting rsync's from the primary one of the checkpoint files for both xymond and xymond_alert. This has the advantage of the secondary system not being in use when it doesn't need to be. When failover happens, you start up xymond on the slave, it reads from the last checkpoint you'd gotten (I'd advise increasing frequency to something like every few mins, depending on your needs) and starts from there. The drawback there is that you don't have graphing/history at all, and you're missing the last few minutes of changes.
If you have heavy network bw available to you, things like DRBD can be used to perform a complete synchronization of *saved* data between the servers.
Henrik had proposed a Xymon Swarm concept at http://lists.xymon.com/pipermail/xymon/2015-November/042684.html , which may also help you evaluate your site's needs.
Really, there are lots of different ways to conceptualize "high availability" for your monitoring system... I'd advise to keep things as simple as possible so as to eliminate failure points. In our case, we've had two live stacks running in parallel that (mostly) submit into a ticket system, which can de-dupe incoming host+svc alerts automatically, which mostly defines the problem out of existence. The things directly emailed to us were of lower frequency, so we were fine with duplicate emails at first. When that got to be too annoying, we left xymond_alert enabled on the second system but used Linux-HA to simply disable Postfix when it wasn't master. When a failover occurred, the startup script was modified to clear out the local outbound queue before starting the service up. Xymon thus never had to care about whether it was primary/secondary at all.
Hope that helps a little bit.
Regards, -jc
participants (2)
-
cleaver@terabithia.org
-
den2k10@yahoo.com