[hobbit] Hobbit newbie from BB: differences and what may Ilose from migrating?

brodie＠mcw.edu

2 Aug 2006 2 Aug '06

5:01 a.m.

Basically, you lose NOTHING when you change a BB server to a Hobbit server. At first, it'll look very similar. Life will be good. The BB clients can run untouched.

However- once you start setting up a few *Hobbit* clients, you'll quickly see what Hobbit DOES-- and what a typical BB client does NOT do.

That's the moment when you'll race to wipe BB completely. Took me about a week. :-)

-----Original Message----- From: Jordan Mendler [mailto:jmendler at ucla.edu] Sent: Tuesday, August 01, 2006 8:43 PM To: hobbit at hswn.dk Subject: RE: [hobbit] Hobbit newbie from BB: differences and what may Ilose from migrating?

Cool. I guess I'll add a second display to bb-hosts and give hobbit a run. I'll just use Shmux to deploy bb-hosts to all the clients (figured I'd mention that great application while I'm at it :-)

Once again, thanks for all the help everyone, hopefully my next message here will be as a convert.

Jordan

Show replies by date

joe＠tmsusa.com

2 Aug 2 Aug

5:18 a.m.

Brodie, Kent wrote:

...

Basically, you lose NOTHING when you change a BB server to a Hobbit server. At first, it'll look very similar. Life will be good. The BB clients can run untouched.

However- once you start setting up a few *Hobbit* clients, you'll quickly see what Hobbit DOES-- and what a typical BB client does NOT do.

That's the moment when you'll race to wipe BB completely. Took me about a week. :-)

We'd like to replace bb with hobbit but there's no way we can't do without the bb failover mechanism. We have 2 separate data centers, and while there are bb servers in both data centers monitoring the hosts on both sides, only side "a" does notifications. When side "b" can not reach side "a", then side "b" "fails over" and takes on the notification tasks, until side "a" becomes reachable again.

There's nothing like that in hobbit yet, but if there were, we'd be able to make the switch.

henrik＠hswn.dk

7:31 a.m.

On Tue, Aug 01, 2006 at 10:18:36PM -0700, J Sloan wrote:

...

Brodie, Kent wrote:

...
Basically, you lose NOTHING when you change a BB server to a Hobbit server.

We'd like to replace bb with hobbit but there's no way we can't do without the bb failover mechanism. We have 2 separate data centers, and while there are bb servers in both data centers monitoring the hosts on both sides, only side "a" does notifications. When side "b" can not reach side "a", then side "b" "fails over" and takes on the notification tasks, until side "a" becomes reachable again.

There's nothing like that in hobbit yet, but if there were, we'd be able to make the switch.

I won't say it is being worked on, but it is definitely on my agenda. My own setup is identical to yours, except that we have a procedure for doing the failover from site "a" to site "b" manually. I've done some planning for how to implement an active/passive cluster-like setup in Hobbit, so ... it's coming.

Regards, Henrik

stephane.caminade＠ias.u-psud.fr

10:05 a.m.

An HTML attachment was scrubbed... URL: <http://lists.xymon.com/pipermail/xymon/attachments/20060802/dfb0b41c/attachment.html>

olivier.beau＠telecomitalia.fr

10:23 a.m.

New subject: rrd-data.log

Hi,

I'm having "Internal error: Duplicate match ignored" in my rrd-data.log, what could cause this ?

olivier

henrik＠hswn.dk

10:59 a.m.

New subject: [hobbit] rrd-data.log

On Wed, Aug 02, 2006 at 12:23:53PM +0200, Beau Olivier wrote:

...

I'm having "Internal error: Duplicate match ignored" in my rrd-data.log, what could cause this ?

It means your netstat data doesn't look like what Hobbit expects. Basically that it found two or more values for the same piece of data.

The best way of identifying which data causes this is probably to run two things at the same time:

login as the hobbit user, and run bbcmd hobbitd_channel --channel=data tee /tmp/data.log
Run "tail -f" on the rrd-data.log file.

When you see that error message in the rrd-data.log file, terminate the first command. You should then have the "guilty" data at the end of the /tmp/data.log file.

I'd obviously be interested to see what it looks like.

Regards, Henrik

olivier.beau＠telecomitalia.fr

11:50 a.m.

New subject: [hobbit] rrd-data.log

Hi,

yes, this is interesting, and i think it points out a new problem, 802.1q on nics :

eth1 Link encap:Ethernet HWaddr 00:0D:9D:4E:11:9C
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2798842 errors:0 dropped:0 overruns:0 frame:0 TX packets:8950695 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:217776970 (207.6 MiB) TX bytes:4275403340 (3.9 GiB) Interrupt:201

eth1.9 Link encap:Ethernet HWaddr 00:0D:9D:4E:11:9C
inet addr:192.168.250.33 Bcast:192.168.250.0 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2226941 errors:0 dropped:0 overruns:0 frame:0 TX packets:3441485 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:520111630 (496.0 MiB) TX bytes:410431496 (391.4 MiB)

eth1.15 Link encap:Ethernet HWaddr 00:0D:9D:4E:11:9C
inet addr:10.11.99.99 Bcast:10.11.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1909363 errors:0 dropped:0 overruns:0 frame:0 TX packets:7253215 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:110322292 (105.2 MiB) TX bytes:1702401944 (1.5 GiB)

olivier

-----Message d'origine----- De : Henrik Stoerner [mailto:henrik at hswn.dk] Envoyé : mercredi 2 août 2006 12:59 À : hobbit at hswn.dk Objet : Re: [hobbit] rrd-data.log

On Wed, Aug 02, 2006 at 12:23:53PM +0200, Beau Olivier wrote:

...

I'm having "Internal error: Duplicate match ignored" in my rrd-data.log, what could cause this ?

It means your netstat data doesn't look like what Hobbit expects. Basically that it found two or more values for the same piece of data.

The best way of identifying which data causes this is probably to run two things at the same time:

login as the hobbit user, and run bbcmd hobbitd_channel --channel=data tee /tmp/data.log
Run "tail -f" on the rrd-data.log file.

When you see that error message in the rrd-data.log file, terminate the first command. You should then have the "guilty" data at the end of the /tmp/data.log file.

I'd obviously be interested to see what it looks like.

Regards, Henrik

To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk

henrik＠hswn.dk

2:52 p.m.

New subject: [hobbit] rrd-data.log

On Wed, Aug 02, 2006 at 01:50:28PM +0200, Beau Olivier wrote:

...

Hi,

yes, this is interesting, and i think it points out a new problem, 802.1q on nics :

eth1 Link encap:Ethernet HWaddr 00:0D:9D:4E:11:9C
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2798842 errors:0 dropped:0 overruns:0 frame:0 TX packets:8950695 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:217776970 (207.6 MiB) TX bytes:4275403340 (3.9 GiB) Interrupt:201

eth1.9 Link encap:Ethernet HWaddr 00:0D:9D:4E:11:9C
inet addr:192.168.250.33 Bcast:192.168.250.0 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2226941 errors:0 dropped:0 overruns:0 frame:0 TX packets:3441485 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:520111630 (496.0 MiB) TX bytes:410431496 (391.4 MiB)

Perhaps, but these data should not get anywhere near the code that prints out this message. The code that generates that message is the one that parses the output from "netstat -s" which should look like

Ip: 3017099 total packets received 1 with invalid addresses 0 forwarded 0 incoming packets discarded 3017058 incoming packets delivered 3154813 requests sent out Icmp: 51081 ICMP messages received 0 input ICMP message failed.

What does this command report on your host?

Regards, Henrik

olivier.beau＠telecomitalia.fr

3:02 p.m.

New subject: [hobbit] rrd-data.log

here the output from data.log about netstat :

Ip: 6774912 total packets received 8069 forwarded 0 incoming packets discarded 6766842 incoming packets delivered 12918060 requests sent out Icmp: 725255 ICMP messages received 1 input ICMP message failed. ICMP input histogram: destination unreachable: 712247 timeout in transit: 30 echo requests: 4212 echo replies: 8766 716456 ICMP messages sent 0 ICMP messages failed ICMP output histogram: destination unreachable: 712244 echo replies: 4212 Tcp: 19091 active connections openings 20926 passive connection openings 1 failed connection attempts 2952 connection resets received 2 connections established 4460418 segments received 11359826 segments send out 76470 segments retransmited 0 bad segments received. 3763 resets sent Udp: 105882 packets received 711976 packets to unknown port received. 0 packet receive errors 817609 packets sent TcpExt: ArpFilter: 0 24632 TCP sockets finished time wait in fast timer 1629 delayed acks sent 2 delayed acks further delayed because of locked socket Quick ack mode was activated 10 times 1101 packets directly queued to recvmsg prequeue. 208762 packets directly received from backlog 8763 packets directly received from prequeue 417698 packets header predicted 155 packets header predicted and directly queued to user TCPPureAcks: 586976 TCPHPAcks: 3180140 TCPRenoRecovery: 0 TCPSackRecovery: 46644 TCPSACKReneging: 0 TCPFACKReorder: 0 TCPSACKReorder: 0 TCPRenoReorder: 0 TCPTSReorder: 0 TCPFullUndo: 0 TCPPartialUndo: 0 TCPDSACKUndo: 0 TCPLossUndo: 1 TCPLoss: 20751 TCPLostRetransmit: 61 TCPRenoFailures: 0 TCPSackFailures: 272 TCPLossFailures: 1 TCPFastRetrans: 62624 TCPForwardRetrans: 1657 TCPSlowStartRetrans: 2052 TCPTimeouts: 3170 TCPRenoRecoveryFail: 0 TCPSackRecoveryFail: 3672 TCPSchedulerFailed: 0 TCPRcvCollapsed: 0 TCPDSACKOldSent: 11 TCPDSACKOfoSent: 0 TCPDSACKRecv: 0 TCPDSACKOfoRecv: 0 TCPAbortOnSyn: 0 TCPAbortOnData: 1432 TCPAbortOnClose: 9 TCPAbortOnMemory: 0 TCPAbortOnTimeout: 8 TCPAbortOnLinger: 0 TCPAbortFailed: 0 TCPMemoryPressures: 0

-----Message d'origine----- De : Henrik Stoerner [mailto:henrik at hswn.dk] Envoyé : mercredi 2 août 2006 16:53 À : hobbit at hswn.dk Objet : Re: [hobbit] rrd-data.log

On Wed, Aug 02, 2006 at 01:50:28PM +0200, Beau Olivier wrote:

...

Hi,

yes, this is interesting, and i think it points out a new problem, 802.1q on nics :

eth1 Link encap:Ethernet HWaddr 00:0D:9D:4E:11:9C
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2798842 errors:0 dropped:0 overruns:0 frame:0 TX packets:8950695 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:217776970 (207.6 MiB) TX bytes:4275403340 (3.9 GiB) Interrupt:201

eth1.9 Link encap:Ethernet HWaddr 00:0D:9D:4E:11:9C
inet addr:192.168.250.33 Bcast:192.168.250.0 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2226941 errors:0 dropped:0 overruns:0 frame:0 TX packets:3441485 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:520111630 (496.0 MiB) TX bytes:410431496 (391.4 MiB)

What does this command report on your host?

Regards, Henrik

To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk

henrik＠hswn.dk

3:05 p.m.

New subject: [hobbit] rrd-data.log

On Wed, Aug 02, 2006 at 04:52:37PM +0200, Henrik Stoerner wrote:

...

On Wed, Aug 02, 2006 at 01:50:28PM +0200, Beau Olivier wrote:

...
Hi,

yes, this is interesting, and i think it points out a new problem, 802.1q on nics :

eth1 Link encap:Ethernet HWaddr 00:0D:9D:4E:11:9C
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2798842 errors:0 dropped:0 overruns:0 frame:0 TX packets:8950695 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:217776970 (207.6 MiB) TX bytes:4275403340 (3.9 GiB) Interrupt:201

eth1.9 Link encap:Ethernet HWaddr 00:0D:9D:4E:11:9C
inet addr:192.168.250.33 Bcast:192.168.250.0 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2226941 errors:0 dropped:0 overruns:0 frame:0 TX packets:3441485 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:520111630 (496.0 MiB) TX bytes:410431496 (391.4 MiB)

Perhaps, but these data should not get anywhere near the code that prints out this message.

Yikes, I cannot remember my own code. You're right - it IS the interface statistics code that triggers this error. OK, I'll try and work out why and how it can be fixed.

Regards, Henrik

henrik＠hswn.dk

3:30 p.m.

New subject: [hobbit] rrd-data.log

On Wed, Aug 02, 2006 at 12:23:53PM +0200, Beau Olivier wrote:

...

I'm having "Internal error: Duplicate match ignored" in my rrd-data.log, what could cause this ?

Turns out to be a couple of bad regular expressions in the interface statistics code. This patch should fix it for both the AIX and Linux systems you've reported this on.

Regards, Henrik

Schrittenlocher＠rz.uni-frankfurt.de

3 Aug 3 Aug

7:24 a.m.

New subject: [hobbit] rrd-data.log

Hi, we have the same issue for netstat and vmstat on Sun Solaris 9 (hobbit 4.1.2). And we had it for other tests as well while running more than two instances of hobbit client usinf different virtual hosts on one machine. regards Rolf

...

On Wed, Aug 02, 2006 at 12:23:53PM +0200, Beau Olivier wrote:

...
I'm having "Internal error: Duplicate match ignored" in my rrd-data.log, what could cause this ?

Turns out to be a couple of bad regular expressions in the interface statistics code. This patch should fix it for both the AIX and Linux systems you've reported this on.

Regards, Henrik

------------------------------------------------------------------------

--- hobbitd/rrd/do_ifstat.c 2006/08/01 21:32:37 1.7 +++ hobbitd/rrd/do_ifstat.c 2006/08/02 15:25:48 @@ -20,7 +20,7 @@ /* eth0 Link encap: */ /* RX bytes: 1829192 (265.8 MiB) TX bytes: 1827320 (187.7 MiB */ static const char *ifstat_linux_exprs[] = { - "^([a-z]+[0-9]+)\\s", + "^([a-z]+[0123456789.:]+)\\s", "^\\s+RX bytes:([0-9]+) .*TX bytes.([0-9]+) " };

@@ -73,7 +73,7 @@ */ static const char *ifstat_aix_exprs[] = { "^ETHERNET STATISTICS \\(([a-z0-9]+)\\) :", - "^Bytes:\\s+(\\d+)\\s+(\\d+)" + "^Bytes:\\s+(\\d+)\\s+Bytes:\\s+(\\d+)" };

------------------------------------------------------------------------

To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk

-- Mit freundlichen Gruessen Rolf Schrittenlocher HRZ/BDV, Senckenberganlage 31, 60054 Frankfurt Tel: (49) 69 - 798 28908 Fax: (49) 69 - 798 28817 LBS: lbs-f at mlist.uni-frankfurt.de Persoenlich: schritte at rz.uni-frankfurt.de

joe＠tmsusa.com

2 Aug 2 Aug

4:22 p.m.

Stephane Caminade wrote:

...

Have you considered setting up some kind of Heartbeat or VRRP system ? At my lab, we use VRRP to share one IP between a master DNS and a secondary DNS which takes over if the primary fails (we have the same system for our web site and our mail server). If the slave cannot contact the master, it takes over the 'public' IP, and can start some services, like bind or dhcpd for example. There seems to be the same kind of possibilities with Heartbeat, but I haven t looked into it yet. You could maybe set up your "b" site to start sending notifications in the event that site "a" is unreachable ?

We thought about this, and the problem with the generic solutions is that they tend to be active/passive. We need both sides active and fully functional all the time, just without redundant notifications, and the failover mechanism of bb does exactly what is needed, out of the box.

We could, given enough time and effort, implement something that would do what we need, but management tends to be very conservative about change, and very reluctant to allow us to spend time on anything not related to the current projects. It's the power of inertia, and the old "If it ain't broke, don't fix it" mentality. IOW, the bb/bbgen-3.6 combo is "good enough" to keep running.

ralphmitchell＠gmail.com

3 Aug 3 Aug

6:32 a.m.

On 8/2/06, J Sloan <joe at tmsusa.com> wrote:

...

We could, given enough time and effort, implement something that would do what we need, but management tends to be very conservative about change, and very reluctant to allow us to spend time on anything not related to the current projects. It's the power of inertia, and the old "If it ain't broke, don't fix it" mentality. IOW, the bb/bbgen-3.6 combo is "good enough" to keep running.

I have a similar kind of management. I came across Hobbit around Christmas and have been running it in parallel to Big Brother since then. The problem of how to switch over was solved for me back in May when the power supply in my Big Brother server blew out. I swear it was nothing I did... :) The machine is old and probably off maintenance, so I figured it would be faster to load a backup copy of my checkout scripts onto the Hobbit server and run with that.

Everybody I've spoken with about it either doesn't care or prefers Hobbit. The lone exception being one person who would prefer to just click on a recycle icon to flip between the main page and the summary, instead of using the drop-down menu...

Ralph Mitchell

7265

Age (days ago)

7266

Last active (days ago)

List overview

Download

13 comments

7 participants

participants (7)

brodie＠mcw.edu
henrik＠hswn.dk
joe＠tmsusa.com
olivier.beau＠telecomitalia.fr
ralphmitchell＠gmail.com
Schrittenlocher＠rz.uni-frankfurt.de
stephane.caminade＠ias.u-psud.fr