On Thu, Mar 26, 2009 at 5:53 PM, Malcolm Hunter <malcolm.hunter at gmx.co.uk>wrote:
The other (the slave one) first checks the status of the other server (a simpel wget of the status page can be enough) and only sends out the alert if this page is not green.
So, basically, both servers are triggering on the same alert, but the slave server only sends out the alert if the primary server is not green.
Wouldn't there be more involved in this? What if the primary server's hobbit daemon was down, but the web service was still running? The secondary server would want to report the hobbit daemon being down, but wouldn't because the primary server's page was still green and hadn't been updated.
The hobbit pages have a time stamp in the title, so the slave can grab the page, grep for <title>, extract the date, convert to seconds using gnudate. Then the slave gets its own timestamp in seconds, subtracts the primary's time and complains if there's more than a couple of minutes difference.
That timestamp doesn't update if the hobbit daemon is down.
Ralph Mitchell