Hi I installed hobbit around a week ago and am very impressed with it but I have a question.
I am occasionally getting a set of emails from it telling me things have a status of green. These aren't after a state change as far as I can tell as there are no other mails before them.
I am getting these once every couple of days, not at any particular time.
Can anyone suggest why?
ta
Robin
On Fri, Feb 18, 2005 at 07:04:47PM +0000, Robin Wood wrote:
I installed hobbit around a week ago and am very impressed with it but I have a question.
I am occasionally getting a set of emails from it telling me things have a status of green. These aren't after a state change as far as I can tell as there are no other mails before them.
I am getting these once every couple of days, not at any particular time.
Sounds a bit odd, but I'd need some more information before trying to track it down.
Which version are you using ?
What's in the ~/data/acks/notifications.log file ?
What are your rules in hobbit-alerts.cfg for sending out alert- and recovery-messages ?
What does the history show for a host that you get one of these messages for ?
Regards, Henrik
The version is 4.0-RC1.
I monitor 3 external boxes and an internal one.
here are the last 2 batches of entries from teh notifications.log file:
Fri Feb 18 22:14:43 2005 another.domain.com.imap (xxx.xxx.xxx.xxx) robin at mydomain.com 1108764882 843 Fri Feb 18 22:14:43 2005 third.domain.com.http (xxx.xxx.xxx.xxx) robin at mydomain.com 1108764882 600 Fri Feb 18 22:14:43 2005 another.domain.com.ssh (xxx.xxx.xxx.xxx) robin at mydomain.com 1108764882 722 Fri Feb 18 22:14:43 2005 internal.domain.int.conn (192.168.0.8) robin at mydomain.com 1108764882 500 Fri Feb 18 22:14:43 2005 another.domain.com.http (xxx.xxx.xxx.xxx) robin at mydomain.com 1108764882 600 Fri Feb 18 22:14:43 2005 internal.domain.int.http (192.168.0.8) robin at mydomain.com 1108764882 600 Fri Feb 18 22:14:43 2005 internal.domain.int.ssh (192.168.0.8) robin at mydomain.com 1108764882 722 Fri Feb 18 22:14:43 2005 alerts.mydomain.com.bbd (192.168.0.8) robin at mydomain.com 1108764882 0 Fri Feb 18 22:14:43 2005 internal.domain.int.smtp (192.168.0.8) robin at mydomain.com 1108764882 725 Fri Feb 18 22:14:43 2005 another.domain.com.smtp (xxx.xxx.xxx.xxx) robin at mydomain.com 1108764882 725 Fri Feb 18 22:14:43 2005 third.domain.com.ftp (xxx.xxx.xxx.xxx) robin at mydomain.com 1108764882 721 Fri Feb 18 22:14:43 2005 third.domain.com.conn (xxx.xxx.xxx.xxx) robin at mydomain.com 1108764882 500 Fri Feb 18 22:14:43 2005 another.domain.com.conn (xxx.xxx.xxx.xxx) robin at mydomain.com 1108764882 500 Fri Feb 18 22:14:43 2005 internal.domain.int.rpc (192.168.0.8) robin at mydomain.com 1108764882 0 Fri Feb 18 22:14:43 2005 alerts.mydomain.com.conn (192.168.0.8) robin at mydomain.com 1108764882 500 Fri Feb 18 22:14:43 2005 alerts.mydomain.com.http (192.168.0.8) robin at mydomain.com 1108764882 600 Fri Feb 18 22:14:43 2005 alerts.mydomain.com.ssh (192.168.0.8) robin at mydomain.com 1108764882 722 Fri Feb 18 22:14:43 2005 internal.domain.int.imap (192.168.0.8) robin at mydomain.com 1108764882 843 Fri Feb 18 22:14:43 2005 internal.domain.int.dns (192.168.0.8) robin at mydomain.com 1108764882 800 Fri Feb 18 22:14:43 2005 alerts.mydomain.com.bbtest (192.168.0.8) robin at mydomain.com 1108764882 0 Sat Feb 19 05:45:08 2005 third.domain.com.ftp (xxx.xxx.xxx.xxx) robin at mydomain.com 1108791908 721 Sat Feb 19 05:45:08 2005 third.domain.com.http (xxx.xxx.xxx.xxx) robin at mydomain.com 1108791908 600 Sat Feb 19 05:45:08 2005 another.domain.com.http (xxx.xxx.xxx.xxx) robin at mydomain.com 1108791908 600 Sat Feb 19 05:45:08 2005 another.domain.com.smtp (xxx.xxx.xxx.xxx) robin at mydomain.com 1108791908 725 Sat Feb 19 05:45:08 2005 another.domain.com.ssh (xxx.xxx.xxx.xxx) robin at mydomain.com 1108791908 722 Sat Feb 19 05:45:08 2005 internal.domain.int.http (192.168.0.8) robin at mydomain.com 1108791908 600 Sat Feb 19 05:45:08 2005 another.domain.com.imap (xxx.xxx.xxx.xxx) robin at mydomain.com 1108791908 843 Sat Feb 19 05:45:09 2005 internal.domain.int.rpc (192.168.0.8) robin at mydomain.com 1108791908 0 Sat Feb 19 05:45:09 2005 internal.domain.int.ssh (192.168.0.8) robin at mydomain.com 1108791908 722 Sat Feb 19 05:45:09 2005 alerts.mydomain.com.ssh (192.168.0.8) robin at mydomain.com 1108791908 722 Sat Feb 19 05:45:09 2005 internal.domain.int.smtp (192.168.0.8) robin at mydomain.com 1108791908 725 Sat Feb 19 05:45:09 2005 internal.domain.int.imap (192.168.0.8) robin at mydomain.com 1108791908 843 Sat Feb 19 05:45:09 2005 internal.domain.int.dns (192.168.0.8) robin at mydomain.com 1108791908 800 Sat Feb 19 05:45:09 2005 alerts.mydomain.com.bbd (192.168.0.8) robin at mydomain.com 1108791909 0 Sat Feb 19 05:45:09 2005 alerts.mydomain.com.http (192.168.0.8) robin at mydomain.com 1108791909 600 Sat Feb 19 05:45:09 2005 alerts.mydomain.com.bbtest (192.168.0.8) robin at mydomain.com 1108791909 0 Sat Feb 19 06:15:08 2005 another.domain.com.conn (xxx.xxx.xxx.xxx) robin at mydomain.com 1108793708 500 Sat Feb 19 06:15:08 2005 internal.domain.int.conn (192.168.0.8) robin at mydomain.com 1108793708 500 Sat Feb 19 06:15:08 2005 third.domain.com.conn (xxx.xxx.xxx.xxx) robin at mydomain.com 1108793708 500 Sat Feb 19 06:15:08 2005 alerts.mydomain.com.conn (192.168.0.8) robin at mydomain.com 1108793708 500
The only rule in the alerts file is
HOST=* MAIL robin at mydomain.com
The histories are showing that the status for most of them is unchanged in the last 14 hours which counting back is when the mails were sent out. The graphs seem to show a gap in monitoring from around 21:30 (just before the first set of notifications entered the logs but no mails were sent out) to around 04:30 (again just before the notifications entered the log).
I know that the servers do a log rotate but that is around midnight.
I can't understand why the status would have changed 14 hours ago and why there should be no log data for any period.
My update period is 30 mins. The rest of the install is virtually straight out of the box with nothing more than what the instructions say to change.
If you want any more info just ask.
Ta
Robin
On Fri, 18 Feb 2005 23:12:48 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
On Fri, Feb 18, 2005 at 07:04:47PM +0000, Robin Wood wrote:
I installed hobbit around a week ago and am very impressed with it but I have a question.
I am occasionally getting a set of emails from it telling me things have a status of green. These aren't after a state change as far as I can tell as there are no other mails before them.
I am getting these once every couple of days, not at any particular time.
Sounds a bit odd, but I'd need some more information before trying to track it down.
Which version are you using ?
What's in the ~/data/acks/notifications.log file ?
What are your rules in hobbit-alerts.cfg for sending out alert- and recovery-messages ?
What does the history show for a host that you get one of these messages for ?
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Just a bit to add to this, the things which are alerting as being green are showing up in the monitor as green smilies, the rest that aren't alerting their green status are green diamonds.
Does this matter?
Robin
On Sat, 19 Feb 2005 20:09:27 +0000, Robin Wood <discardable1 at gmail.com> wrote:
The version is 4.0-RC1.
I monitor 3 external boxes and an internal one.
here are the last 2 batches of entries from teh notifications.log file:
Fri Feb 18 22:14:43 2005 another.domain.com.imap (xxx.xxx.xxx.xxx) robin at mydomain.com 1108764882 843 Fri Feb 18 22:14:43 2005 third.domain.com.http (xxx.xxx.xxx.xxx) robin at mydomain.com 1108764882 600 Fri Feb 18 22:14:43 2005 another.domain.com.ssh (xxx.xxx.xxx.xxx) robin at mydomain.com 1108764882 722 Fri Feb 18 22:14:43 2005 internal.domain.int.conn (192.168.0.8) robin at mydomain.com 1108764882 500 Fri Feb 18 22:14:43 2005 another.domain.com.http (xxx.xxx.xxx.xxx) robin at mydomain.com 1108764882 600 Fri Feb 18 22:14:43 2005 internal.domain.int.http (192.168.0.8) robin at mydomain.com 1108764882 600 Fri Feb 18 22:14:43 2005 internal.domain.int.ssh (192.168.0.8) robin at mydomain.com 1108764882 722 Fri Feb 18 22:14:43 2005 alerts.mydomain.com.bbd (192.168.0.8) robin at mydomain.com 1108764882 0 Fri Feb 18 22:14:43 2005 internal.domain.int.smtp (192.168.0.8) robin at mydomain.com 1108764882 725 Fri Feb 18 22:14:43 2005 another.domain.com.smtp (xxx.xxx.xxx.xxx) robin at mydomain.com 1108764882 725 Fri Feb 18 22:14:43 2005 third.domain.com.ftp (xxx.xxx.xxx.xxx) robin at mydomain.com 1108764882 721 Fri Feb 18 22:14:43 2005 third.domain.com.conn (xxx.xxx.xxx.xxx) robin at mydomain.com 1108764882 500 Fri Feb 18 22:14:43 2005 another.domain.com.conn (xxx.xxx.xxx.xxx) robin at mydomain.com 1108764882 500 Fri Feb 18 22:14:43 2005 internal.domain.int.rpc (192.168.0.8) robin at mydomain.com 1108764882 0 Fri Feb 18 22:14:43 2005 alerts.mydomain.com.conn (192.168.0.8) robin at mydomain.com 1108764882 500 Fri Feb 18 22:14:43 2005 alerts.mydomain.com.http (192.168.0.8) robin at mydomain.com 1108764882 600 Fri Feb 18 22:14:43 2005 alerts.mydomain.com.ssh (192.168.0.8) robin at mydomain.com 1108764882 722 Fri Feb 18 22:14:43 2005 internal.domain.int.imap (192.168.0.8) robin at mydomain.com 1108764882 843 Fri Feb 18 22:14:43 2005 internal.domain.int.dns (192.168.0.8) robin at mydomain.com 1108764882 800 Fri Feb 18 22:14:43 2005 alerts.mydomain.com.bbtest (192.168.0.8) robin at mydomain.com 1108764882 0 Sat Feb 19 05:45:08 2005 third.domain.com.ftp (xxx.xxx.xxx.xxx) robin at mydomain.com 1108791908 721 Sat Feb 19 05:45:08 2005 third.domain.com.http (xxx.xxx.xxx.xxx) robin at mydomain.com 1108791908 600 Sat Feb 19 05:45:08 2005 another.domain.com.http (xxx.xxx.xxx.xxx) robin at mydomain.com 1108791908 600 Sat Feb 19 05:45:08 2005 another.domain.com.smtp (xxx.xxx.xxx.xxx) robin at mydomain.com 1108791908 725 Sat Feb 19 05:45:08 2005 another.domain.com.ssh (xxx.xxx.xxx.xxx) robin at mydomain.com 1108791908 722 Sat Feb 19 05:45:08 2005 internal.domain.int.http (192.168.0.8) robin at mydomain.com 1108791908 600 Sat Feb 19 05:45:08 2005 another.domain.com.imap (xxx.xxx.xxx.xxx) robin at mydomain.com 1108791908 843 Sat Feb 19 05:45:09 2005 internal.domain.int.rpc (192.168.0.8) robin at mydomain.com 1108791908 0 Sat Feb 19 05:45:09 2005 internal.domain.int.ssh (192.168.0.8) robin at mydomain.com 1108791908 722 Sat Feb 19 05:45:09 2005 alerts.mydomain.com.ssh (192.168.0.8) robin at mydomain.com 1108791908 722 Sat Feb 19 05:45:09 2005 internal.domain.int.smtp (192.168.0.8) robin at mydomain.com 1108791908 725 Sat Feb 19 05:45:09 2005 internal.domain.int.imap (192.168.0.8) robin at mydomain.com 1108791908 843 Sat Feb 19 05:45:09 2005 internal.domain.int.dns (192.168.0.8) robin at mydomain.com 1108791908 800 Sat Feb 19 05:45:09 2005 alerts.mydomain.com.bbd (192.168.0.8) robin at mydomain.com 1108791909 0 Sat Feb 19 05:45:09 2005 alerts.mydomain.com.http (192.168.0.8) robin at mydomain.com 1108791909 600 Sat Feb 19 05:45:09 2005 alerts.mydomain.com.bbtest (192.168.0.8) robin at mydomain.com 1108791909 0 Sat Feb 19 06:15:08 2005 another.domain.com.conn (xxx.xxx.xxx.xxx) robin at mydomain.com 1108793708 500 Sat Feb 19 06:15:08 2005 internal.domain.int.conn (192.168.0.8) robin at mydomain.com 1108793708 500 Sat Feb 19 06:15:08 2005 third.domain.com.conn (xxx.xxx.xxx.xxx) robin at mydomain.com 1108793708 500 Sat Feb 19 06:15:08 2005 alerts.mydomain.com.conn (192.168.0.8) robin at mydomain.com 1108793708 500
The only rule in the alerts file is
HOST=* MAIL robin at mydomain.com
The histories are showing that the status for most of them is unchanged in the last 14 hours which counting back is when the mails were sent out. The graphs seem to show a gap in monitoring from around 21:30 (just before the first set of notifications entered the logs but no mails were sent out) to around 04:30 (again just before the notifications entered the log).
I know that the servers do a log rotate but that is around midnight.
I can't understand why the status would have changed 14 hours ago and why there should be no log data for any period.
My update period is 30 mins. The rest of the install is virtually straight out of the box with nothing more than what the instructions say to change.
If you want any more info just ask.
Ta
Robin
On Fri, 18 Feb 2005 23:12:48 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
On Fri, Feb 18, 2005 at 07:04:47PM +0000, Robin Wood wrote:
I installed hobbit around a week ago and am very impressed with it but I have a question.
I am occasionally getting a set of emails from it telling me things have a status of green. These aren't after a state change as far as I can tell as there are no other mails before them.
I am getting these once every couple of days, not at any particular time.
Sounds a bit odd, but I'd need some more information before trying to track it down.
Which version are you using ?
What's in the ~/data/acks/notifications.log file ?
What are your rules in hobbit-alerts.cfg for sending out alert- and recovery-messages ?
What does the history show for a host that you get one of these messages for ?
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Fri, Feb 18, 2005 at 07:04:47PM +0000, Robin Wood wrote:
I am occasionally getting a set of emails from it telling me things have a status of green. These aren't after a state change as far as I can tell as there are no other mails before them.
I think I've resolved this in the RC4 release that will be available shortly. I would appreciate it if you would try it out and let me know if this problem is solved.
Regards, Henrik
ye, I'll check it out. What was wrong?
On Sun, 27 Feb 2005 17:04:10 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
On Fri, Feb 18, 2005 at 07:04:47PM +0000, Robin Wood wrote:
I am occasionally getting a set of emails from it telling me things have a status of green. These aren't after a state change as far as I can tell as there are no other mails before them.
I think I've resolved this in the RC4 release that will be available shortly. I would appreciate it if you would try it out and let me know if this problem is solved.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Just some extra info, this is the top of a mail I was getting:
Subject: BB [182299] otherdomain.com:ssh stopped reporting to BB
Date: Mon, 28 Feb 2005 06:23:42 +0000 (GMT)
From: bb at mydomain.co.uk (BigBrother)
green <!-- [flags:OrdastILe] --> Mon Feb 28 05:53:31 2005 ssh ok
Service ssh on otherdomain.com is OK (up)
It claims tha tit stoped reporting but gave me a green status.
Robin
On Mon, 28 Feb 2005 08:33:09 +0000, Robin Wood <discardable1 at gmail.com> wrote:
ye, I'll check it out. What was wrong?
On Sun, 27 Feb 2005 17:04:10 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
On Fri, Feb 18, 2005 at 07:04:47PM +0000, Robin Wood wrote:
I am occasionally getting a set of emails from it telling me things have a status of green. These aren't after a state change as far as I can tell as there are no other mails before them.
I think I've resolved this in the RC4 release that will be available shortly. I would appreciate it if you would try it out and let me know if this problem is solved.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Mon, Feb 28, 2005 at 08:35:09AM +0000, Robin Wood wrote:
Just some extra info, this is the top of a mail I was getting:
Subject: BB [182299] otherdomain.com:ssh stopped reporting to BB
Date: Mon, 28 Feb 2005 06:23:42 +0000 (GMT) From: bb at mydomain.co.uk (BigBrother)green <!-- [flags:OrdastILe] --> Mon Feb 28 05:53:31 2005 ssh ok
OK, this isn't a "green" mail - it's purple! The clue is the subject "otherdomain.com:ssh stopped reporting". The "green" is just the last statusreport that was sent before it stopped reporting any further status.
Henrik
I've just put rc4 on so I'll see if anything does get fixed, two questions though, first why would things stop reporting? I'm monitoring 3 different boxes, one local, 2 remote on different hosts, what constitutes "stopping reporting"? I have my internet connection all night so it can't be that, especially as one box is the box that has the monitor on it.
The other is why are some of my green entries smilies and others diamonds?
Ta
Robin
On Mon, 28 Feb 2005 12:58:09 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
On Mon, Feb 28, 2005 at 08:35:09AM +0000, Robin Wood wrote:
Just some extra info, this is the top of a mail I was getting:
Subject: BB [182299] otherdomain.com:ssh stopped reporting to BB Date: Mon, 28 Feb 2005 06:23:42 +0000 (GMT) From: bb at mydomain.co.uk (BigBrother)
green <!-- [flags:OrdastILe] --> Mon Feb 28 05:53:31 2005 ssh ok
OK, this isn't a "green" mail - it's purple! The clue is the subject "otherdomain.com:ssh stopped reporting". The "green" is just the last statusreport that was sent before it stopped reporting any further status.
Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Thu, Mar 03, 2005 at 08:19:38PM +0000, Robin Wood wrote:
I've just put rc4 on so I'll see if anything does get fixed
Do pickup the post-RC4 patch, it has the final fix for the green mails. http://www.hswn.dk/beta/post-RC4.patch
two questions though, first why would things stop reporting?
Most common cause: The server was rebooted, and the client was not setup to restart automatically after a boot.
I'm monitoring 3 different boxes, one local, 2 remote on different hosts, what constitutes "stopping reporting"?
A status in Hobbit (and BB) has a lifetime - default is 30 minutes. Normally a status is refreshed every 5 minutes, so it stays "alive". If Hobbit sees that a status has not been updated for so long that its lifetime has been exceeded, it goes into the "stopped reporting" (purple) state.
The other is why are some of my green entries smilies and others diamonds?
Smilies mean the color has changed within the past 24 hours.
Henrik
On Thu, 3 Mar 2005 23:15:41 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
On Thu, Mar 03, 2005 at 08:19:38PM +0000, Robin Wood wrote:
I've just put rc4 on so I'll see if anything does get fixed
Do pickup the post-RC4 patch, it has the final fix for the green mails. http://www.hswn.dk/beta/post-RC4.patch
two questions though, first why would things stop reporting?
Most common cause: The server was rebooted, and the client was not setup to restart automatically after a boot. None of the boxes get rebooted, they are all live servers running 24/7, two with ISPs and one my own which I know the uptime of.
I'm monitoring 3 different boxes, one local, 2 remote on different hosts, what constitutes "stopping reporting"?
A status in Hobbit (and BB) has a lifetime - default is 30 minutes. Normally a status is refreshed every 5 minutes, so it stays "alive". If Hobbit sees that a status has not been updated for so long that its lifetime has been exceeded, it goes into the "stopped reporting" (purple) state.
I've never seen anything actually go purple when the mails were sent out but I don't watch it all the time so it could have done.
The other is why are some of my green entries smilies and others diamonds?
Smilies mean the color has changed within the past 24 hours.
ok sounds reasonable that if it sends out the mails then it is because it thinks the status has changed.
I was going to report that RC4 had fixed it as I'd had no mails but then I got this:
- Program crashed
Fatal signal caught!
on the hobbit-alert monitor so I guess that may be why I hadn't got any.
I'll put the other patch on and see what happens.
Robin
Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
One other thing I did think of is that I set my monitor period to be 30 mins, could that have anything to do with it, something to do with the time to live and the refresh period being the same?
On Fri, 4 Mar 2005 23:59:22 +0000, Robin Wood <discardable1 at gmail.com> wrote:
On Thu, 3 Mar 2005 23:15:41 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
On Thu, Mar 03, 2005 at 08:19:38PM +0000, Robin Wood wrote:
I've just put rc4 on so I'll see if anything does get fixed
Do pickup the post-RC4 patch, it has the final fix for the green mails. http://www.hswn.dk/beta/post-RC4.patch
two questions though, first why would things stop reporting?
Most common cause: The server was rebooted, and the client was not setup to restart automatically after a boot. None of the boxes get rebooted, they are all live servers running 24/7, two with ISPs and one my own which I know the uptime of.
I'm monitoring 3 different boxes, one local, 2 remote on different hosts, what constitutes "stopping reporting"?
A status in Hobbit (and BB) has a lifetime - default is 30 minutes. Normally a status is refreshed every 5 minutes, so it stays "alive". If Hobbit sees that a status has not been updated for so long that its lifetime has been exceeded, it goes into the "stopped reporting" (purple) state.
I've never seen anything actually go purple when the mails were sent out but I don't watch it all the time so it could have done.
The other is why are some of my green entries smilies and others diamonds?
Smilies mean the color has changed within the past 24 hours.
ok sounds reasonable that if it sends out the mails then it is because it thinks the status has changed.
I was going to report that RC4 had fixed it as I'd had no mails but then I got this:
- Program crashed
Fatal signal caught!
on the hobbit-alert monitor so I guess that may be why I hadn't got any.
I'll put the other patch on and see what happens.
Robin
Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
RC5 is unfortunatly still causing random "x stopped reporting" errors. I just got 20 mails similar to this one:
Subject: BB [431703] mydomain.int:imap stopped reporting to BB Date: Mon, 7 Mar 2005 22:02:52 +0000 (GMT)
green Mon Mar 7 21:32:41 2005 imap ok
Service imap on mydomain.int is OK (up)
- OK [CAPABILITY IMAP4rev1 UIDPLUS CHILDREN NAMESPACE THREAD=ORDEREDSUBJECT THREAD=REFERENCES SORT QUOTA IDLE ACL ACL2=UNION STARTTLS] Courier-IMAP ready. Copyright 1998-2004 Double Precision, Inc. See COPYING for distribution information.
- BYE Courier-IMAP server shutting down ABC123 OK LOGOUT completed
Seconds: 0.01
This is for the IMAP server on the same box as the monitor so there could be no network or connection issues. Anyone any ideas of anything else to try?
A good side is that it is happening less frequently.
Robin
On Sat, 5 Mar 2005 00:00:30 +0000, Robin Wood <discardable1 at gmail.com> wrote:
One other thing I did think of is that I set my monitor period to be 30 mins, could that have anything to do with it, something to do with the time to live and the refresh period being the same?
On Fri, 4 Mar 2005 23:59:22 +0000, Robin Wood <discardable1 at gmail.com> wrote:
On Thu, 3 Mar 2005 23:15:41 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
On Thu, Mar 03, 2005 at 08:19:38PM +0000, Robin Wood wrote:
I've just put rc4 on so I'll see if anything does get fixed
Do pickup the post-RC4 patch, it has the final fix for the green mails. http://www.hswn.dk/beta/post-RC4.patch
two questions though, first why would things stop reporting?
Most common cause: The server was rebooted, and the client was not setup to restart automatically after a boot. None of the boxes get rebooted, they are all live servers running 24/7, two with ISPs and one my own which I know the uptime of.
I'm monitoring 3 different boxes, one local, 2 remote on different hosts, what constitutes "stopping reporting"?
A status in Hobbit (and BB) has a lifetime - default is 30 minutes. Normally a status is refreshed every 5 minutes, so it stays "alive". If Hobbit sees that a status has not been updated for so long that its lifetime has been exceeded, it goes into the "stopped reporting" (purple) state.
I've never seen anything actually go purple when the mails were sent out but I don't watch it all the time so it could have done.
The other is why are some of my green entries smilies and others diamonds?
Smilies mean the color has changed within the past 24 hours.
ok sounds reasonable that if it sends out the mails then it is because it thinks the status has changed.
I was going to report that RC4 had fixed it as I'd had no mails but then I got this:
- Program crashed
Fatal signal caught!
on the hobbit-alert monitor so I guess that may be why I hadn't got any.
I'll put the other patch on and see what happens.
Robin
Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Mon, Mar 07, 2005 at 10:35:04PM +0000, Robin Wood wrote:
RC5 is unfortunatly still causing random "x stopped reporting" errors. I just got 20 mails similar to this one:
Subject: BB [431703] mydomain.int:imap stopped reporting to BB Date: Mon, 7 Mar 2005 22:02:52 +0000 (GMT)
green Mon Mar 7 21:32:41 2005 imap ok
Well, that report is more than 30 minutes old - the report is from Mar 7 21:32, and the alert is dated Mar 7 22:02.
You mentioned that
One other thing I did think of is that I set my monitor period to be 30 mins, could that have anything to do with it, something to do with the time to live and the refresh period being the same?
What exactly is is that you've changed ? I dont quite follow what you mean with "monitor period".
What's the "interval" setting in hobbitlaunch.cfg for the [bbnet] task?
Regards, Henrik
This is the setting I have in hobbitlaunch.cfg
[bbnet]
ENVFILE /home/bb/server/etc/hobbitserver.cfg
NEEDS hobbitd
CMD bbtest-net --report --ping --checkresponse
LOGFILE $BBSERVERLOGS/bb-network.log
INTERVAL 30m
I am wondering if the problem is that sometimes this isn't getting its data in before the alterer tries to pick the data up so the data is slightly over 30 minutes old and so causes the alerts to be sent out.
On Tue, 8 Mar 2005 00:08:27 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
On Mon, Mar 07, 2005 at 10:35:04PM +0000, Robin Wood wrote:
RC5 is unfortunatly still causing random "x stopped reporting" errors. I just got 20 mails similar to this one:
Subject: BB [431703] mydomain.int:imap stopped reporting to BB Date: Mon, 7 Mar 2005 22:02:52 +0000 (GMT)
green Mon Mar 7 21:32:41 2005 imap ok
Well, that report is more than 30 minutes old - the report is from Mar 7 21:32, and the alert is dated Mar 7 22:02.
You mentioned that
One other thing I did think of is that I set my monitor period to be 30 mins, could that have anything to do with it, something to do with the time to live and the refresh period being the same?
What exactly is is that you've changed ? I dont quite follow what you mean with "monitor period".
What's the "interval" setting in hobbitlaunch.cfg for the [bbnet] task?
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Tue, Mar 08, 2005 at 01:52:02PM +0000, Robin Wood wrote:
This is the setting I have in hobbitlaunch.cfg
[bbnet] ENVFILE /home/bb/server/etc/hobbitserver.cfg
NEEDS hobbitd CMD bbtest-net --report --ping --checkresponse LOGFILE $BBSERVERLOGS/bb-network.log INTERVAL 30mI am wondering if the problem is that sometimes this isn't getting its data in before the alterer tries to pick the data up so the data is slightly over 30 minutes old and so causes the alerts to be sent out.
Yep, that is it. Network tests have a lifetime of 30 minutes before they go purple, so if you only run the network tests with 30 minute intervals, there are bound to be some occasions where the network tests fails to update the status before the go-purple triggers.
Just don't set the interval that high - problem fixed.
Regards, Henrik
I'll drop it to 25 mins and that should fix it.
Ta
Robin
On Tue, 8 Mar 2005 15:15:27 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
On Tue, Mar 08, 2005 at 01:52:02PM +0000, Robin Wood wrote:
This is the setting I have in hobbitlaunch.cfg
[bbnet] ENVFILE /home/bb/server/etc/hobbitserver.cfg NEEDS hobbitd CMD bbtest-net --report --ping --checkresponse LOGFILE $BBSERVERLOGS/bb-network.log INTERVAL 30m
I am wondering if the problem is that sometimes this isn't getting its data in before the alterer tries to pick the data up so the data is slightly over 30 minutes old and so causes the alerts to be sent out.
Yep, that is it. Network tests have a lifetime of 30 minutes before they go purple, so if you only run the network tests with 30 minute intervals, there are bound to be some occasions where the network tests fails to update the status before the go-purple triggers.
Just don't set the interval that high - problem fixed.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
It hasn't been completely resolved in RC4. I found a bug in the way recovery messages was being handled that could trigger these to go out when they should not, and thought that might be the cause of the problem.
Kevin Hanrahan actually found another reason you may get an unexpected green mail - if you have setup alerts to be sent out only on red (COLOR=red), you won't get an alert when it goes yellow (obviously). But you will get the recovery notice when it goes back to green! I'm working on that one, but need to do some more testing later today before I send out the fix.
Regards, Henrik
On Mon, Feb 28, 2005 at 08:33:09AM +0000, Robin Wood wrote:
ye, I'll check it out. What was wrong?
On Sun, 27 Feb 2005 17:04:10 +0100, Henrik Stoerner <henrik at hswn.dk> wrote:
On Fri, Feb 18, 2005 at 07:04:47PM +0000, Robin Wood wrote:
I am occasionally getting a set of emails from it telling me things have a status of green. These aren't after a state change as far as I can tell as there are no other mails before them.
I think I've resolved this in the RC4 release that will be available shortly. I would appreciate it if you would try it out and let me know if this problem is solved.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Henrik Storner
participants (2)
-
discardable1@gmail.com
-
henrik@hswn.dk