Hello everybody, Since 7 month I use hobbit to observe the availability of our VSE, VM and Linux systems running on IBM z9. The hobbit-server 4.2.0 is running in sles10 sp2 on this IBM z9. Everything worked well until last friday. In the Current Staus screen the icons for bbd, bbtest, http and the complete conn column changed from green to purple (no report). After a system reboot everything seems to be ok again because all icons were green. But after some time the problem was back, the icons for bbd, bbtest, http and the complete conn column changed from green to purple (no report). When I log on to hobbit and run 'bbtest-net' manually all purple icons change to green except the bbtest icon (still purple). After a while the already mentioned icon will change to purple again.
So it looks to me that bbtest-net will not restart after the interval of 5 min.
I have reduced the number of observed hosts from 20 to 10 and inserted --concurrency=N in the hobbitlaunch.cfg. But the problem still exists.
Here is a part of my hobbitlaunch.cfg
"bbnet" runs the bbtest-net tool to perform the network based tests -
i.e. http, smtp, ssh, dns and
all of the various network protocols we need to test.
[bbnet] ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD bbtest-net --report --ping --checkresponse --concurrency=N LOGFILE $BBSERVERLOGS/bb-network.log INTERVAL 5m
These are my running hobbit processes
lx100:/usr/lib/hobbit/server/etc # ps afx|grep hobbit|grep -v grep 14314 ? Ss 0:04 /usr/lib/hobbit/server/bin/hobbitlaunch --config=/usr/lib/hobbit/server/etc/hobbitlaunch.cfg --env=/usr/lib/hobbit/server/etc/hobbitserver.cfg --log=/var/log/hobbit/hobbitlaunch.log --pidfile=/var/log/hobbit/hobbitlaunch.pid 14315 ? S 0:13 \_ hobbitd --pidfile=/var/log/hobbit/hobbitd.pid --restart=/usr/lib/hobbit/server/tmp/hobbitd.chk --checkpoint-file=/usr/lib/hobbit/server/tmp/hobbitd.chk --checkpoint-interval=600 --log=/var/log/hobbit/hobbitd.log --admin-senders=127.0.0.1 127.0.0.1 --store-clientlogs=!msgs 14316 ? S 0:00 \_ hobbitd_channel --channel=stachg --log=/var/log/hobbit/history.log hobbitd_history 14317 ? S 0:00 | \_ hobbitd_history 14318 ? S 0:00 \_ hobbitd_channel --channel=clichg --log=/var/log/hobbit/hostdata.log hobbitd_hostdata 14319 ? S 0:00 | \_ hobbitd_hostdata 14320 ? S 0:00 \_ hobbitd_channel --channel=page --log=/var/log/hobbit/page.log hobbitd_alert --checkpoint-file=/usr/lib/hobbit/server/tmp/alert.chk --checkpoint-interval=600 14322 ? S 0:01 | \_ hobbitd_alert --checkpoint-file=/usr/lib/hobbit/server/tmp/alert.chk --checkpoint-interval=600 14321 ? S 0:00 \_ hobbitd_channel --channel=status --log=/var/log/hobbit/rrd-status.log hobbitd_rrd --rrddir=/var/lib/hobbit/rrd --extra-script=/usr/lib/hobbit/server/ext/zvmvse.sh --extra-tests=procs 14323 ? S 0:05 | \_ hobbitd_rrd --rrddir=/var/lib/hobbit/rrd --extra-script=/usr/lib/hobbit/server/ext/zvmvse.sh --extra-tests=procs 14324 ? S 0:00 \_ hobbitd_channel --channel=data --log=/var/log/hobbit/rrd-data.log hobbitd_rrd --rrddir=/var/lib/hobbit/rrd 14325 ? S 0:01 | \_ hobbitd_rrd --rrddir=/var/lib/hobbit/rrd 14326 ? S 0:00 \_ hobbitd_channel --channel=client --log=/var/log/hobbit/clientdata.log hobbitd_client 14327 ? S 0:02 | \_ hobbitd_client 13039 ? S 0:00 sh -c vmstat 300 2 1>/usr/lib/hobbit/client/tmp/hobbit_vmstat.localhost.13022 2>&1; mv /usr/lib/hobbit/client/tmp/hobbit_vmstat.localhost.13022 /usr/lib/hobbit/client/tmp/hobbit_vmstat.localhost lx100:/usr/lib/hobbit/server/etc #
I cannot see any errormessage. There is enough space in the filesystem. Is it necessary to create a crontab entry to start bbtest-net every 5 min?
My colleagues promissed me, that they have nothing changed. I have no idea what happended. Every hint is appreciated.
kind regards Horst Rempel
participants (1)
-
hrempel@bgchemie.de