Some thoughts on clustered hobbit
First, let me express my thanks to Brian for putting this document together and allowing Henrik to distribute it! I've a lot of experience with IBM's HACMP for AIX, and getting a clustered configuration working as desired is not a trivial procedure.
Henrik -- check me on this: it's my impression we no longer need a 'BBPAGER' entry on the client-side bb-hosts because the hobbit server passes all potentially alertable statuses to hobbit-alert and it decides if an alert is really required.
Brian -- no offense, but I would rather categorise your configuration as "active/inactive". I'm looking at doing an "active/passive" cluster when time frees up -- about a month from now. The difference? I'm running two hobbit/apache instances all the time -- but the 'passive' (fallover) side is not doing alerting or network tests. It does build displays (it's my technical documentation server as well) and it does keep both history and rrd data updated. Both hosts show up on the client side as 'BBDISPLAY'. On failover it will take over the IP address for the hobbit display and re-launch hobbit with network testing and alerting enabled.
Depending on host count and test count, this might be a bad idea -- but we've only got about 300 entries in bb-hosts.
So -- thanks ever so much, again, for providing this -- it will make my life ever so much easier next month when I get the time to automate the failover environment.
Tom
Tom Kauffman NIBCO, Inc
On Mon, May 09, 2005 at 04:19:43PM -0500, Kauffman, Tom wrote:
Henrik -- check me on this: it's my impression we no longer need a 'BBPAGER' entry on the client-side bb-hosts because the hobbit server passes all potentially alertable statuses to hobbit-alert and it decides if an alert is really required.
Correct.
In fact it does make sense to remove it, because then the client will not initiate a connection to the server to send the "page" message - which hobbitd promptly just discards.
Brian -- no offense, but I would rather categorise your configuration as "active/inactive". I'm looking at doing an "active/passive" cluster when time frees up -- about a month from now. The difference? I'm running two hobbit/apache instances all the time -- but the 'passive' (fallover) side is not doing alerting or network tests. It does build displays (it's my technical documentation server as well) and it does keep both history and rrd data updated. Both hosts show up on the client side as 'BBDISPLAY'. On failover it will take over the IP address for the hobbit display and re-launch hobbit with network testing and alerting enabled.
That would certainly be interesting for me as well - it's the kind of setup I plan to implement when I get time.
Regards, Henrik
Hi All,
I am adding a custom graph to track Oracle tablespace usage. I currently have an "oracle" test which shows the overall green/yellow/red status of tablespaces based on percent full thresholds.
I want to append graphs showing each tablespace's size/used stats for a better view of historical growth, but I don't want the raw data showing in the HTML page.
I seem to recall using a <data> type tag in the message text but can't find a good example. I'm comfortable with post-processing the message to generate the actual RRDs and graphs but would appreciate some help on the message side!
TIA, Andy.
#####################################################################################
This email is intended for the person to whom it is addressed only. If you are not the intended recipient, do not read, copy or use the contents in any way. The opinions expressed may not necessarily reflect those of ZESPRI Group of Companies ('ZESPRI').
While every effort has been made to verify the information contained herein, ZESPRI does not make any representations as to the accuracy of the information or to the performance of any data, information or the products mentioned herein. ZESPRI will not accept liability for any losses, damage or consequence, however, resulting directly or indirectly from the use of this e-mail/attachments. #####################################################################################
Andy France wrote on 10/05/2005 12:22:48:
Hi All,
I am adding a custom graph to track Oracle tablespace usage. I currently have an "oracle" test which shows the overall green/yellow/red status of tablespaces based on percent full thresholds.
I want to append graphs showing each tablespace's size/used stats for a better view of historical growth, but I don't want the raw data showing in the HTML page.
I seem to recall using a <data> type tag in the message text but can't find a good example. I'm comfortable with post-processing the message to generate the actual RRDs and graphs but would appreciate some help on the message side!
TIA, Andy.
I have made some headway on this - what I was looking for was a simple set of HTML comment tags <!-- Comment -->
So now I have my data embedded in the message thus:
<!--DATA oracle.ZP1.psapbtabd.rrd:134696864:2010520:1.49 oracle.ZP1.psapbtabi.rrd:95354800:1822184:1.91 oracle.ZP1.psapuser1i.rrd:2585600:282128:10.91 oracle.ZP1.psapstabd.rrd:14479360:1598952:11.04 oracle.ZP1.psapstabi.rrd:14520304:1872960:12.90 oracle.ZP1.dbtotsize.rrd:317055808:46472960:14.66 oracle.ZP1.psapes40bd.rrd:6184960:1728248:27.94 oracle.ZP1.system.rrd:307200:108216:35.23 oracle.ZP1.psapes40bi.rrd:2068480:799904:38.67 oracle.ZP1.psapuser1d.rrd:2068480:887688:42.92 oracle.ZP1.psapel40bd.rrd:2068480:1025088:49.56 oracle.ZP1.psapddicd.rrd:512000:258168:50.42 oracle.ZP1.psapclud.rrd:10342400:5982072:57.84 oracle.ZP1.psapdocud.rrd:71680:44984:62.76 oracle.ZP1.psappoold.rrd:4136960:2726784:65.91 oracle.ZP1.psapdocui.rrd:71680:51568:71.94 oracle.ZP1.psapddici.rrd:512000:380504:74.32 oracle.ZP1.psappooli.rrd:4136960:3139896:75.90 oracle.ZP1.psapsourcei.rrd:307200:243096:79.13 oracle.ZP1.psapclui.rrd:2068480:1771992:85.67 oracle.ZP1.psapsourced.rrd:307200:264672:86.16 oracle.ZP1.psapproti.rrd:1024000:888880:86.81 oracle.ZP1.psapprotd.rrd:4136960:3735416:90.29 oracle.ZP1.psapel40bi.rrd:409600:373160:91.10 oracle.ZP1.psaproll.rrd:8273920:8065888:97.49 oracle.ZP1.psaploadd.rrd:102400:102288:99.89 oracle.ZP1.psaploadi.rrd:102400:102288:99.89 oracle.ZP1.psaptemp.rrd:6205440:6205416:100.00 -->
It doesn't show up on the web page, and I can happily parse it with my custom rrd handler script (I really must get to grips with moving it to C...).
But now I'm having trouble with the hobbitgraph.cfg entry. What I want is a series of stacked graphs, one for each rrd file, showing used space over total space. But I can't figure out how to specify multiple sources with FNPATTERN, but ask rrdgraph to create a unique graph per file rather than trying to combine them like disk etc.
I can't see any useful switches in the rrdgraph man page. Can anyone help with this?
TIA, Andy.
#####################################################################################
This email is intended for the person to whom it is addressed only. If you are not the intended recipient, do not read, copy or use the contents in any way. The opinions expressed may not necessarily reflect those of ZESPRI Group of Companies ('ZESPRI').
While every effort has been made to verify the information contained herein, ZESPRI does not make any representations as to the accuracy of the information or to the performance of any data, information or the products mentioned herein. ZESPRI will not accept liability for any losses, damage or consequence, however, resulting directly or indirectly from the use of this e-mail/attachments. #####################################################################################
On 5/9/05, Kauffman, Tom <KauffmanT at nibco.com> wrote:
First, let me express my thanks to Brian for putting this document together and allowing Henrik to distribute it! I've a lot of experience with IBM's HACMP for AIX, and getting a clustered configuration working as desired is not a trivial procedure.
Henrik -- check me on this: it's my impression we no longer need a 'BBPAGER' entry on the client-side bb-hosts because the hobbit server passes all potentially alertable statuses to hobbit-alert and it decides if an alert is really required.
Brian -- no offense, but I would rather categorise your configuration as "active/inactive". I'm looking at doing an "active/passive" cluster when time frees up -- about a month from now. The difference? I'm running two hobbit/apache instances all the time -- but the 'passive' (fallover) side is not doing alerting or network tests. It does build displays (it's my technical documentation server as well) and it does keep both history and rrd data updated. Both hosts show up on the client side as 'BBDISPLAY'. On failover it will take over the IP address for the hobbit display and re-launch hobbit with network testing and alerting enabled.
I agree with your assessment, but chose the model for a few reasons (note that I'm basing my experience on about 2 1/2 years running a dual big brother failover setup):
- There is always one repository for both configuration and data that are kept reasonably identical on both systems (within the synch delay).
- There is only one ip address accepting BB reports cutting down on both network traffic and firewall rules (for hosts in locked down vlans).
- The other system can be dedicated to another purpose (it currently hosts our documentation site that fails over in the opposite direction).
- No redundant work is done. Indeed, no load is being 'shared' across the systems unless you host the web server on the other box. There is a risk to this based on the possibility of complete machine failure in between synchronizations. Hence, Hobbit may come up without all the updates for hosts or alerts. Based on my current model, I will lose about a day of historical data. These synch rates can be changed and a gigabit crossover between machines cuts down on any traffic imposed by multiple synch's.
Note that you could very easily turn off the hobbit alerts with the same clustering software by truncating and restoring the hobbit-alerts.cfg file. Not sure how to disable the network tests, so that may require some custom coding... Once complete, you could use the same cluster resource sw to accomplish a 'hot' standby.
Depending on host count and test count, this might be a bad idea -- but
we've only got about 300 entries in bb-hosts.
So -- thanks ever so much, again, for providing this -- it will make my life ever so much easier next month when I get the time to automate the failover environment.
Tom
Tom Kauffman NIBCO, Inc
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On May 10, 2005, at 3:21 AM, Brian Lynch wrote:
On 5/9/05, Kauffman, Tom <KauffmanT at nibco.com> wrote: First, let me express my thanks to Brian for putting this document together and allowing Henrik to distribute it! I've a lot of
experience with IBM's HACMP for AIX, and getting a clustered configuration
working as desired is not a trivial procedure.Henrik -- check me on this: it's my impression we no longer need a 'BBPAGER' entry on the client-side bb-hosts because the hobbit server passes all potentially alertable statuses to hobbit-alert and it
decides if an alert is really required.Brian -- no offense, but I would rather categorise your
configuration as "active/inactive". I'm looking at doing an "active/passive" cluster
when time frees up -- about a month from now. The difference? I'm
running two hobbit/apache instances all the time -- but the 'passive' (fallover) side is not doing alerting or network tests. It does build displays (it's my technical documentation server as well) and it does keep both history and rrd data updated. Both hosts show up on the client side as 'BBDISPLAY'. On failover it will take over the IP address for the
hobbit display and re-launch hobbit with network testing and alerting
enabled.
Software exists to do this active/passive, a search for HA or
FAILOVER will the Linux-HA project is one example.
I biggest issue in my view is keeping the systems in sync and is
often done with a shared storage.
I agree with your assessment, but chose the model for a few reasons (note that I'm basing my experience on about 2 1/2 years running a
dual big brother failover setup):
- There is always one repository for both configuration and data
that are kept reasonably identical on both systems (within the synch delay).- There is only one ip address accepting BB reports cutting down on both network traffic and firewall rules (for hosts in locked down
vlans).- The other system can be dedicated to another purpose (it currently hosts our documentation site that fails over in the opposite
direction).- No redundant work is done. Indeed, no load is being 'shared' across the systems unless you host the web server on the other box. There is a risk to this based on the possibility of complete
machine failure in between synchronizations. Hence, Hobbit may come up without all the updates for hosts or alerts. Based on my current model, I
will lose about a day of historical data. These synch rates can be changed and a gigabit crossover between machines cuts down on any traffic imposed by multiple synch's.Note that you could very easily turn off the hobbit alerts with the
same clustering software by truncating and restoring the hobbit- alerts.cfg file. Not sure how to disable the network tests, so that may require some custom coding... Once complete, you could use the same cluster resource sw to accomplish a 'hot' standby.
So I think the only issue you have with this method is that you don't
want the extra network load. If you are willing to require a cross-
over cable, and that is a requirement for most HA solutions, then one
solution might be to add the idea of a BACKUP_BBDISPLAY. The
BBDISPLAY server would forward all incoming messages to the BACKUP
and if you have the private network cable it would not cause any load
on your network.
You would also need a way of telling the Hobbit software on the
system that is it the BACKUP server.
John
participants (5)
-
Andy@zespri.com
-
brianlynch@gmail.com
-
henrik@hswn.dk
-
jturner@ns.wcpss.net
-
KauffmanT@nibco.com