Sorry I have to bother you again, but this afternoon I experienced the same bouncing effect problem again:
the server to be tested went down at 13:31. After 10 minutes conn worked again but after 5 minutes conn failed and did not come up afterwards.
Then the following happened:
At 14:01 cpu, procs, msgs, disk went purple. At 14:02 cpu, procs, msgs, disk went clear. At 14:06 cpu, procs, msgs, disk went purple. At 14:06 cpu, procs, msgs, disk went clear. At 14:11 cpu, procs, msgs, disk went purple. At 14:11 cpu, procs, msgs, disk went clear. At 14:16 cpu, procs, msgs, disk went purple. At 14:16 cpu, procs, msgs, disk went clear.
Then I blue dotted the test untill the machine went up again.
Perhaps I should add the following info: the Hobbit server gets it's data from a Big Brother server (yeah, it's the last week were Hobbit runs parallel to a BB-server; production next week after my week off). So it's not Hobbit who is performing the network tests, yet.
On the Hobbit server the bbd- and bbtest-columns are blue dotted and I have commented out the bbnet-test in the hobbitlaunch.cfg, so I thought the network test are not performed twice and everything should be working fine. Well, the mail actions do work :-]
Should I disable the bbretest-test as well?
Thank for your help,
Peter
2005/6/27, Peter Welter <peter.welter at gmail.com>:
Hi all,
Hope you cold shine a light on this one. I'm running Hobbit 4.0.4 on Solaris 8. I've enabled reporting as follows with macro's in hobbit-alert.cfg:
$UNIXDAY=MAIL someone at somewhere.nl TIME=*:0000:2359 REPEAT=1d RECOVERED
$UNIXNIGHT=MAIL someone at somewhere.nl TIME=*:0000:2359 DURATION>30m REPEAT=60m RECOVERED SERVICE=!cpu,!msgs COLOR=!yellow
This morning I found in my mailbox about 4000 mails reporting stopped & recovered. What happened was that the connection for 'host' failed. I've got 1 email of that event as it should. But mainly, until the connection re-established last night, a bouncing effect for other than the conn-test. I could build a dependancy (don't warn if no-conn) but that is rather annoying since I have many hosts to manage.
I just don't fully understand why the other tests are bouncing whereas the conn-test turns off on friday and on last night. See below:
... Bigbrother account Hobbit [705353] host:disk stopped reporting (PURPLE) za 25-06-05 6:04 1 KB Bigbrother account Hobbit [735704] host:msgs stopped reporting (PURPLE) za 25-06-05 6:04 1 KB Bigbrother account Hobbit [115710] host:procs stopped reporting (PURPLE) za 25-06-05 6:04 1 KB Bigbrother account Hobbit [963024] host:cpu stopped reporting (PURPLE) za 25-06-05 6:04 1 KB Bigbrother account Hobbit host:msgs recovered za 25-06-05 6:05 1 KB Bigbrother account Hobbit host:procs recovered za 25-06-05 6:05 1 KB Bigbrother account Hobbit host:disk recovered za 25-06-05 6:05 1 KB Bigbrother account Hobbit host:cpu recovered za 25-06-05 6:05 1 KB Bigbrother account Hobbit [982927] host:procs stopped reporting (PURPLE) za 25-06-05 6:09 1 KB Bigbrother account Hobbit [276744] host:disk stopped reporting (PURPLE) za 25-06-05 6:09 1 KB Bigbrother account Hobbit [850873] host:msgs stopped reporting (PURPLE) za 25-06-05 6:09 1 KB Bigbrother account Hobbit [612113] host:cpu stopped reporting (PURPLE) za 25-06-05 6:09 1 KB
The process check for example turn purple, of course, but then a clear status appears:
... Sun Jun 26 20:09:44 2005 clear 0:04:43 Sun Jun 26 20:09:27 2005 purple 0:00:17 Sun Jun 26 20:04:32 2005 clear 0:04:55 Sun Jun 26 20:04:27 2005 purple 0:00:05 Sun Jun 26 20:00:22 2005 clear 0:04:05 Sun Jun 26 19:59:26 2005 purple 0:00:56 ...
Thanks,
Peter