-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Adam Goryachev wrote:
Anyway, the problem is that approximately since then, a number of client reports are not completely received. Sometimes some of the ps output is truncated, sometimes the ports sections is truncated, etc. This leads to false positive alerts (ie, procs goes red because some monitored procs are not running since they were after the truncated section).
I've increased the timeout on the hobbitd (--timeout=60) but this doesn't seem to have helped. The only common factor between the clients which have this problem are:
- Most of them are running bbproxy and passing status messages from a number of clients.
- The rest of them are on very slow connections, or frequently very busy connections.
I have made some 'progress' of sorts.
I've increased the MAX values as I was getting some "Oversize ... truncated" messages in my log file. I then went home thinking "Great, I managed to solve this one thing today at least". Except, I started getting messages a few hours later.
So after further investigation, I've decided I really can't work out what is happening, and why it isn't working. I've enabled debug output from bbproxy, but I don't really know what it all means.
I can see that if I set bbproxy to only forward messages to 127.0.0.1 the local hobbit server gets all the data correctly. If I add the remote server, then some things don't work properly. Since it is likely all a big jumbled mess by now, I'll post a few sections of config files, and hopefully someone will notice my stupid mistake (or multiple mistakes)...
I have a network 10.x.x.x which has a hobbit server at 10.30.10.9, all client machines report to 10.30.10.9 as the BBDISPLAY/BBPAGER (most are windows PC's using the BB windows client), one is a linux hobbit-client and of course 10.30.10.9 is a hobbit client (plus a couple of old ext scripts using the old BB env). I think all this is working fine, since nothing goes randomly purple/red.
10.30.10.9 is behind NAT but has complete access to the internet.
I have a remote server behind a NAT router which has port 1984 port forwarded to it. It is receiving reports from around 20 other hobbit client machines perfectly, so I don't suspect the NAT router/hobbit config itself.
Some config from 10.30.10.9:
hobbitserver.cfg: BBSERVERIP="127.0.0.1" BBDISP="127.0.0.1" BBDISPLAYS="" MAXLINE="32768"
hobbitclient.cfg BBDISP="10.30.10.9" BBDISPLAYS="" BB="$BBHOME/bin/bb --debug --timeout=60" MAXLINE="32768"
hobbitlaunch.cfg [hobbitd] ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg CMD hobbitd --pidfile=$BBSERVERLOGS/hobbitd.pid
- --restart=$BBTMP/hobbitd.chk --checkpoint-file=$BBTMP/hobbitd.chk
- --checkpoint-interval=600 --log=$BBSERVERLOGS/hobbitd.log
- --admin-senders=127.0.0.1,$BBSERVERIP --store-clientlogs=!msgs
- --listen=127.0.0.1
[bbproxy] ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg CMD $BBHOME/bin/bbproxy --hobbitd
- --bbdisplay=123.234.456.567,127.0.0.1 --listen=10.30.10.9
- --report=$MACHINE.bbproxy --no-daemon --timeout=30
- --pidfile=$BBSERVERLOGS/bbproxy.pid --debug --log-details CMD $BBHOME/bin/bbproxy --hobbitd --bbdisplay=127.0.0.1
- --listen=10.30.10.9 --report=$MACHINE.bbproxy --no-daemon --timeout=30
- --pidfile=$BBSERVERLOGS/bbproxy.pid --debug --log-details LOGFILE $BBSERVERLOGS/bbproxy.log
[hobbitclient] ENVFILE /usr/lib/hobbit/client/etc/hobbitclient.cfg NEEDS hobbitd CMD /usr/lib/hobbit/client/bin/hobbitclient.sh LOGFILE $BBSERVERLOGS/hobbitclient.log INTERVAL 5m
On the remote hobbit server with the public IP I have: hobbitserver.cfg BBSERVERIP="192.168.2.6" BBDISP="192.168.2.6" BBDISPLAYS="" MAXLINE="32768" MAXMSG_STATUS="1024" MAXMSG_CLIENT="1024" MAXMSG_DATA="512"
hobbitlaunch.cfg [hobbitd] HEARTBEAT ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg CMD hobbitd --pidfile=$BBSERVERLOGS/hobbitd.pid
- --restart=$BBTMP/hobbitd.chk --checkpoint-file=$BBTMP/hobbitd.chk
- --checkpoint-interval=600 --log=$BBSERVERLOGS/hobbitd.log
- --admin-senders=127.0.0.1,$BBSERVERIP
- --maint-senders=127.0.0.1,$BBSERVERIP -www-senders=127.0.0.1,$BBSERVERIP
- --store-clientlogs=!msgs --timeout=60
Any suggestions as to what is going wrong would be really appreciated.
BTW, bbnet tests from the 10.30.10.9 host are not submitted to the bbproxy at all because of the BBDISP setting in the hobbitserver.cfg, but if I change this to point to 10.30.10.9 then it seems to break the web interface. I'm not really too concerned about this right now though....
Thanks for any tips/pointers/etc
Regards, Adam -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFIHyvcGyoxogrTyiURAhpyAKCsnO4px+b4Ml04yjzZvXgFxeuaogCeKwy6 KwOEboPhIXFb4YVgdA0ndlk= =T5Lc -----END PGP SIGNATURE-----