bbgen frequent yellow alerts - hobbitd problem?
Hi,
We are running a new installation of Hobbit 4.2 on Solaris 10 running in a non-global zone. Server is a v240 but I don't think that matters here.
The problem here is that our bbgen status turns yellow with fairly high frequency, sometimes multiple times an hour, at (what seem like) random intervals. In the yellow alert bbgen reports: "hobbitd status-board not available"
During this time the hobbitd daemon is still running and the next time that bbgen runs the alert (usually) turns green. I've tested this by running bbgen every second, every 15 seconds, and every minute. The same is also true if I run bbgen by hand.
During the 'yellow alert' time window the bb2.html gets updated with "All Monitored Systems OK" When all monitored systems are NOT OK. When the status turns green again this page reflects the correct status for the non-green systems.
Below are the output from some commands/logs. These logs don't really seem to help, so let me know if there is anything else that I can send along to debug this issue.
Any help is appreciated - we're near the point of frustration to where we may have to pull the plug on Hobbit and go back to our old BB installation.
Thanks in advance. -Jon
(logs below)
hobbitd log from --debug. Way less entries here than normal. 2006-11-03 10:54:00 -> do_message/1 (12 bytes): hobbitdboard 2006-11-03 10:54:00 -> update_statistics 2006-11-03 10:54:00 <- update_statistics 2006-11-03 10:54:00 -> oksender 2006-11-03 10:54:00 <- oksender(1-a) 2006-11-03 10:54:00 -> setup_filter: hobbitdboard 2006-11-03 10:54:00 <- setup_filter: hobbitdboard 2006-11-03 10:54:00 <- do_message/1 2006-11-03 10:54:01 -> do_message/1 (0 bytes): 2006-11-03 10:54:01 -> update_statistics 2006-11-03 10:54:01 <- update_statistics 2006-11-03 10:54:01 <- do_message/1
$BB --debug $BBDISP "hobbitdboard" (with no --debug on a 'failure' I get no output. I'm assuming this is the same cause of the bbgen yellow alert)
2006-11-03 10:54:01 Transport setup is: 2006-11-03 10:54:01 bbdportnumber = 1984 2006-11-03 10:54:01 bbdispproxyhost = NONE 2006-11-03 10:54:01 bbdispproxyport = 0 2006-11-03 10:54:01 Recipient listed as '10.xxx.xxx.xxx' 2006-11-03 10:54:01 Standard BB protocol on port 1984 2006-11-03 10:54:01 Will connect to address 10.xxx.xxx.xxx port 1984 2006-11-03 10:54:01 Connect status is 0 2006-11-03 10:54:01 Sent 12 bytes 2006-11-03 10:54:01 Closing connection
bbgen --debug --report (this one turned bbgen yellow/unavailable. Note the quick disconnect.) 2006-11-03 09:51:03 load_state() 2006-11-03 09:51:03 Transport setup is: 2006-11-03 09:51:03 bbdportnumber = 1984 2006-11-03 09:51:03 bbdispproxyhost = NONE 2006-11-03 09:51:03 bbdispproxyport = 0 2006-11-03 09:51:03 Recipient listed as '10.xxx.xxx.xxx' 2006-11-03 09:51:03 Standard BB protocol on port 1984 2006-11-03 09:51:03 Will connect to address 10.xxx.xxx.xxx port 1984 2006-11-03 09:51:03 Connect status is 0 2006-11-03 09:51:03 Sent 126 bytes 2006-11-03 09:51:03 Closing connection
bbgen --debug --report (this one worked fine) 2006-11-03 09:54:00 load_state() 2006-11-03 09:54:00 Transport setup is: 2006-11-03 09:54:00 bbdportnumber = 1984 2006-11-03 09:54:00 bbdispproxyhost = NONE 2006-11-03 09:54:00 bbdispproxyport = 0 2006-11-03 09:54:00 Recipient listed as '10.xxx.xxx.xxx' 2006-11-03 09:54:00 Standard BB protocol on port 1984 2006-11-03 09:54:00 Will connect to address 10.xxx.xxx.xxx port 1984 2006-11-03 09:54:00 Connect status is 0 2006-11-03 09:54:00 Sent 126 bytes 2006-11-03 09:54:00 Read 16384 bytes 2006-11-03 09:54:00 Read 32767 bytes 2006-11-03 09:54:00 Read 1 bytes 2006-11-03 09:54:00 Read 32767 bytes 2006-11-03 09:54:00 Read 32767 bytes 2006-11-03 09:54:00 Read 24578 bytes 2006-11-03 09:54:00 Read 32767 bytes 2006-11-03 09:54:00 Read 32767 bytes 2006-11-03 09:54:00 Read 24578 bytes 2006-11-03 09:54:00 Read 16503 bytes 2006-11-03 09:54:00 Closing connection
On Mon, Nov 06, 2006 at 07:35:27AM -0800, Mr-Pope wrote:
We are running a new installation of Hobbit 4.2 on Solaris 10 running in a non-global zone. Server is a v240 but I don't think that matters here.
The problem here is that our bbgen status turns yellow with fairly high frequency, sometimes multiple times an hour, at (what seem like) random intervals. In the yellow alert bbgen reports: "hobbitd status-board not available"
The reports I've had of this only have one thing in common: They all happen on Solaris 10. So I'm beginning to suspect that maybe Solaris doesn't work quite the way other systems do.
Or perhaps there is a bug, and something special in Solaris triggers it.
Below are the output from some commands/logs. These logs don't really seem to help, so let me know if there is anything else that I can send along to debug this issue.
$BB --debug $BBDISP "hobbitdboard" (with no --debug on a 'failure' I get no output. I'm assuming this is the same cause of the bbgen yellow alert)
Yes.
bbgen --debug --report (this one turned bbgen yellow/unavailable. Note the quick disconnect.) 2006-11-03 09:51:03 load_state() 2006-11-03 09:51:03 Transport setup is: 2006-11-03 09:51:03 bbdportnumber = 1984 2006-11-03 09:51:03 bbdispproxyhost = NONE 2006-11-03 09:51:03 bbdispproxyport = 0 2006-11-03 09:51:03 Recipient listed as '10.xxx.xxx.xxx' 2006-11-03 09:51:03 Standard BB protocol on port 1984 2006-11-03 09:51:03 Will connect to address 10.xxx.xxx.xxx port 1984 2006-11-03 09:51:03 Connect status is 0 2006-11-03 09:51:03 Sent 126 bytes 2006-11-03 09:51:03 Closing connection
Interesting.
Since it seems that this bites you more than most others, I'd like you to do a couple of things for me to figure out what is going on. I need you to add a couple of debugging lines to Hobbit.
First, in the bbdisplay/loaddata.c file, around line 436 you'll find the code that prints out the "hobbitd status board not available" message. It looks like this: errprintf("hobbitd status-board not available\n"); I want you to change that to errprintf("hobbitd status-board not available, code %d\n", hobbitdresult);
Next, in the lib/sendmsg.c file around line 340 is where the code is that receives data from Hobbit. You'll find these lines:
n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0);
if (n > 0) {
I'd like you to add 8 lines between these two:
n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0);
if (n < 0) {
dbgprintf("recv() returned error: %s\n", strerror(errno));
if (errno == EAGAIN) continue;
}
if (n == 0) {
dbgprintf("recv() gave us 0 bytes\n");
continue;
}
if (n > 0) {
(it isn't the prettiest of programming, but it does the job for now).
After making these two changes, run "make clean; make" and copy the bbdisplay/bbgen binary into your ~hobbit/server/bin/ directory. Let Hobbit run as normal (with --debug on the bbgen command) and when it fails I am very interested to see what's in the logfile.
Regards, Henrik
Oops - one line too many among those extra ones I sent you:
On Mon, Nov 06, 2006 at 05:29:59PM +0100, Henrik Stoerner wrote:
I'd like you to add 8 lines between these two:
n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0); if (n < 0) { dbgprintf("recv() returned error: %s\n", strerror(errno)); if (errno == EAGAIN) continue; } if (n == 0) { dbgprintf("recv() gave us 0 bytes\n"); continue; ^^^^^^^^^ dont put this "continue" line in there.
Regards, Henrik
are there many problems encountered when you using Solaris with Hobbit? I was going to base my design on Solaris but I am now wondering if I should steer more towards an AIX solution.
This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
Henrik Stoerner <henrik at hswn.dk> 06/11/2006 16:33 Please respond to hobbit at hswn.dk
To hobbit at hswn.dk cc
Subject Re: [hobbit] bbgen frequent yellow alerts - hobbitd problem?
Oops - one line too many among those extra ones I sent you:
On Mon, Nov 06, 2006 at 05:29:59PM +0100, Henrik Stoerner wrote:
I'd like you to add 8 lines between these two:
n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0); if (n < 0) { dbgprintf("recv() returned error: %s\n",strerror(errno)); if (errno == EAGAIN) continue; } if (n == 0) { dbgprintf("recv() gave us 0 bytes\n"); continue; ^^^^^^^^^ dont put this "continue" line in there.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Solaris works great once you get past the linker problems. That is well-known territory if you read the hints.
From: Matthew G Armstrong [mailto:marmstrong5 at csc.com]
Sent: Monday, November 06, 2006 10:39 AM
To: hobbit at hswn.dk
Subject: Re: [hobbit] bbgen frequent yellow alerts - hobbitd
problem?
are there many problems encountered when you using Solaris with
Hobbit? I was going to base my design on Solaris but I am now wondering if I should steer more towards an AIX solution.
This is a PRIVATE message. If you are not the intended
recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
Henrik Stoerner <henrik at hswn.dk>
06/11/2006 16:33 Please respond to hobbit at hswn.dk
To hobbit at hswn.dk cc Subject Re: [hobbit] bbgen frequent yellow alerts - hobbitd problem?
Oops - one line too many among those extra ones I sent you:
On Mon, Nov 06, 2006 at 05:29:59PM +0100, Henrik Stoerner wrote:
> I'd like you to add 8 lines between these two:
>
> n = recv(sockfd, recvbuf, sizeof(recvbuf)-1,
0); > if (n < 0) { > dbgprintf("recv() returned error: %s\n", strerror(errno)); > if (errno == EAGAIN) continue; > } > if (n == 0) { > dbgprintf("recv() gave us 0 bytes\n"); > continue; ^^^^^^^^^ dont put this "continue" line in there.
Regards,
Henrik
To unsubscribe from the hobbit list, send an e-mail to
hobbit-unsubscribe at hswn.dk
Hi Matt,
(always good to see a fellow csc'er on the list :-))
On Mon, Nov 06, 2006 at 04:39:22PM +0000, Matthew G Armstrong wrote:
are there many problems encountered when you using Solaris with Hobbit? I was going to base my design on Solaris but I am now wondering if I should steer more towards an AIX solution.
I don't think Solaris is a problem. Most likely it's just my code which doesn't work quite the way it should. I believe there are lots of Hobbit users out there running on Solaris.
Regards, Henrik
Thanks, Henrik.
I made the changes that you suggested and copied bbgen to the appropriate directory. When we get yellow alerts the following is the new "Error output":
hobbitd status-board not available, code 0
In the log I get the following messages during the connection process:
*cut*
2006-11-06 09:59:57 load_state() 2006-11-06 09:59:57 Transport setup is: 2006-11-06 09:59:57 bbdportnumber = 1984 2006-11-06 09:59:57 bbdispproxyhost = NONE 2006-11-06 09:59:57 bbdispproxyport = 0 2006-11-06 09:59:57 Recipient listed as '10.xxx.xxx.xxx' 2006-11-06 09:59:57 Standard BB protocol on port 1984 2006-11-06 09:59:57 Will connect to address 10.xxx.xxx.xxx port 1984 2006-11-06 09:59:57 Connect status is 0 2006-11-06 09:59:57 Sent 126 bytes 2006-11-06 09:59:57 recv() gave us 0 bytes 2006-11-06 09:59:57 Closing connection
*cut*
2006-11-06 09:59:58 Recipient listed as '10.xxx.xxx.xxx' 2006-11-06 09:59:58 Standard BB protocol on port 1984 2006-11-06 09:59:58 Will connect to address 10.xxx.xxx.xxx port 1984 2006-11-06 09:59:58 Connect status is 0 2006-11-06 09:59:58 Sent 1384 bytes 2006-11-06 09:59:58 Closing connection 2006-11-06 09:59:58 1 status messages merged into 2 transmissions
*end*
I did not see the impact of the changes to sendmsg.c anywhere in the debug output.
-Jon
On 11/6/06, Henrik Stoerner <henrik at hswn.dk> wrote:
On Mon, Nov 06, 2006 at 07:35:27AM -0800, Mr-Pope wrote:
We are running a new installation of Hobbit 4.2 on Solaris 10 running in a non-global zone. Server is a v240 but I don't think that matters here.
The problem here is that our bbgen status turns yellow with fairly high frequency, sometimes multiple times an hour, at (what seem like) random intervals. In the yellow alert bbgen reports: "hobbitd status-board not available"
The reports I've had of this only have one thing in common: They all happen on Solaris 10. So I'm beginning to suspect that maybe Solaris doesn't work quite the way other systems do.
Or perhaps there is a bug, and something special in Solaris triggers it.
Below are the output from some commands/logs. These logs don't really seem to help, so let me know if there is anything else that I can send along to debug this issue.
$BB --debug $BBDISP "hobbitdboard" (with no --debug on a 'failure' I get no output. I'm assuming this is the same cause of the bbgen yellow alert)
Yes.
bbgen --debug --report (this one turned bbgen yellow/unavailable. Note the quick disconnect.) 2006-11-03 09:51:03 load_state() 2006-11-03 09:51:03 Transport setup is: 2006-11-03 09:51:03 bbdportnumber = 1984 2006-11-03 09:51:03 bbdispproxyhost = NONE 2006-11-03 09:51:03 bbdispproxyport = 0 2006-11-03 09:51:03 Recipient listed as '10.xxx.xxx.xxx' 2006-11-03 09:51:03 Standard BB protocol on port 1984 2006-11-03 09:51:03 Will connect to address 10.xxx.xxx.xxx port 1984 2006-11-03 09:51:03 Connect status is 0 2006-11-03 09:51:03 Sent 126 bytes 2006-11-03 09:51:03 Closing connection
Interesting.
Since it seems that this bites you more than most others, I'd like you to do a couple of things for me to figure out what is going on. I need you to add a couple of debugging lines to Hobbit.
First, in the bbdisplay/loaddata.c file, around line 436 you'll find the code that prints out the "hobbitd status board not available" message. It looks like this: errprintf("hobbitd status-board not available\n"); I want you to change that to errprintf("hobbitd status-board not available, code %d\n", hobbitdresult);
Next, in the lib/sendmsg.c file around line 340 is where the code is that receives data from Hobbit. You'll find these lines:
n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0); if (n > 0) {I'd like you to add 8 lines between these two:
n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0); if (n < 0) { dbgprintf("recv() returned error: %s\n", strerror(errno)); if (errno == EAGAIN) continue; } if (n == 0) { dbgprintf("recv() gave us 0 bytes\n"); continue; } if (n > 0) {(it isn't the prettiest of programming, but it does the job for now).
After making these two changes, run "make clean; make" and copy the bbdisplay/bbgen binary into your ~hobbit/server/bin/ directory. Let Hobbit run as normal (with --debug on the bbgen command) and when it fails I am very interested to see what's in the logfile.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Henrik,
It might be worth checking to make sure these problems are only on Solaris 10 x86 as that is the only architecture I've seen this problem on, sparc seems fine so might help you in narrowing down the problem.
Regards,
Mike Rowell -----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: 06 November 2006 16:30 To: hobbit at hswn.dk Subject: Re: [hobbit] bbgen frequent yellow alerts - hobbitd problem?
On Mon, Nov 06, 2006 at 07:35:27AM -0800, Mr-Pope wrote:
We are running a new installation of Hobbit 4.2 on Solaris 10 running in a non-global zone. Server is a v240 but I don't think that matters here.
The problem here is that our bbgen status turns yellow with fairly high frequency, sometimes multiple times an hour, at (what seem like) random intervals. In the yellow alert bbgen reports: "hobbitd status-board not available"
The reports I've had of this only have one thing in common: They all happen on Solaris 10. So I'm beginning to suspect that maybe Solaris doesn't work quite the way other systems do.
Or perhaps there is a bug, and something special in Solaris triggers it.
Below are the output from some commands/logs. These logs don't really seem to help, so let me know if there is anything else that I can send along to debug this issue.
$BB --debug $BBDISP "hobbitdboard" (with no --debug on a 'failure' I get no output. I'm assuming this is the same cause of the bbgen yellow alert)
Yes.
bbgen --debug --report (this one turned bbgen yellow/unavailable. Note the quick disconnect.) 2006-11-03 09:51:03 load_state() 2006-11-03 09:51:03 Transport setup is: 2006-11-03 09:51:03 bbdportnumber = 1984 2006-11-03 09:51:03 bbdispproxyhost = NONE 2006-11-03 09:51:03 bbdispproxyport = 0 2006-11-03 09:51:03 Recipient listed as '10.xxx.xxx.xxx' 2006-11-03 09:51:03 Standard BB protocol on port 1984 2006-11-03 09:51:03 Will connect to address 10.xxx.xxx.xxx port 1984 2006-11-03 09:51:03 Connect status is 0 2006-11-03 09:51:03 Sent 126 bytes 2006-11-03 09:51:03 Closing connection
Interesting.
Since it seems that this bites you more than most others, I'd like you to do a couple of things for me to figure out what is going on. I need you to add a couple of debugging lines to Hobbit.
First, in the bbdisplay/loaddata.c file, around line 436 you'll find the code that prints out the "hobbitd status board not available" message. It looks like this: errprintf("hobbitd status-board not available\n"); I want you to change that to errprintf("hobbitd status-board not available, code %d\n", hobbitdresult);
Next, in the lib/sendmsg.c file around line 340 is where the code is that receives data from Hobbit. You'll find these lines:
n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0);
if (n > 0) {
I'd like you to add 8 lines between these two:
n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0);
if (n < 0) {
dbgprintf("recv() returned error: %s\n",
strerror(errno)); if (errno == EAGAIN) continue; } if (n == 0) { dbgprintf("recv() gave us 0 bytes\n"); continue; } if (n > 0) {
(it isn't the prettiest of programming, but it does the job for now).
After making these two changes, run "make clean; make" and copy the bbdisplay/bbgen binary into your ~hobbit/server/bin/ directory. Let Hobbit run as normal (with --debug on the bbgen command) and when it fails I am very interested to see what's in the logfile.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
This email has been scanned for all viruses by the MessageLabs service.
This email has been scanned for all viruses by the MessageLabs service.
I have seen it with solaris 10 sparc. Solaris 9 sparc was fine.
From: Mike Rowell <Mike.Rowell at Rightmove.co.uk> Reply-To: <hobbit at hswn.dk> Date: Mon, 6 Nov 2006 18:33:41 -0000 To: <hobbit at hswn.dk> Conversation: [hobbit] bbgen frequent yellow alerts - hobbitd problem? Subject: RE: [hobbit] bbgen frequent yellow alerts - hobbitd problem?
Henrik,
It might be worth checking to make sure these problems are only on Solaris 10 x86 as that is the only architecture I've seen this problem on, sparc seems fine so might help you in narrowing down the problem.
Regards,
Mike Rowell -----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: 06 November 2006 16:30 To: hobbit at hswn.dk Subject: Re: [hobbit] bbgen frequent yellow alerts - hobbitd problem?
On Mon, Nov 06, 2006 at 07:35:27AM -0800, Mr-Pope wrote:
We are running a new installation of Hobbit 4.2 on Solaris 10 running in a non-global zone. Server is a v240 but I don't think that matters here.
The problem here is that our bbgen status turns yellow with fairly high frequency, sometimes multiple times an hour, at (what seem like) random intervals. In the yellow alert bbgen reports: "hobbitd status-board not available"
The reports I've had of this only have one thing in common: They all happen on Solaris 10. So I'm beginning to suspect that maybe Solaris doesn't work quite the way other systems do.
Or perhaps there is a bug, and something special in Solaris triggers it.
Below are the output from some commands/logs. These logs don't really seem to help, so let me know if there is anything else that I can send along to debug this issue.
$BB --debug $BBDISP "hobbitdboard" (with no --debug on a 'failure' I get no output. I'm assuming this is the same cause of the bbgen yellow alert)
Yes.
bbgen --debug --report (this one turned bbgen yellow/unavailable. Note the quick disconnect.) 2006-11-03 09:51:03 load_state() 2006-11-03 09:51:03 Transport setup is: 2006-11-03 09:51:03 bbdportnumber = 1984 2006-11-03 09:51:03 bbdispproxyhost = NONE 2006-11-03 09:51:03 bbdispproxyport = 0 2006-11-03 09:51:03 Recipient listed as '10.xxx.xxx.xxx' 2006-11-03 09:51:03 Standard BB protocol on port 1984 2006-11-03 09:51:03 Will connect to address 10.xxx.xxx.xxx port 1984 2006-11-03 09:51:03 Connect status is 0 2006-11-03 09:51:03 Sent 126 bytes 2006-11-03 09:51:03 Closing connection
Interesting.
Since it seems that this bites you more than most others, I'd like you to do a couple of things for me to figure out what is going on. I need you to add a couple of debugging lines to Hobbit.
First, in the bbdisplay/loaddata.c file, around line 436 you'll find the code that prints out the "hobbitd status board not available" message. It looks like this: errprintf("hobbitd status-board not available\n"); I want you to change that to errprintf("hobbitd status-board not available, code %d\n", hobbitdresult);
Next, in the lib/sendmsg.c file around line 340 is where the code is that receives data from Hobbit. You'll find these lines:
n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0); if (n > 0) {
I'd like you to add 8 lines between these two:
n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0); if (n < 0) { dbgprintf("recv() returned error: %s\n", strerror(errno)); if (errno == EAGAIN) continue; } if (n == 0) { dbgprintf("recv() gave us 0 bytes\n"); continue; } if (n > 0) {
(it isn't the prettiest of programming, but it does the job for now).
After making these two changes, run "make clean; make" and copy the bbdisplay/bbgen binary into your ~hobbit/server/bin/ directory. Let Hobbit run as normal (with --debug on the bbgen command) and when it fails I am very interested to see what's in the logfile.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
This email has been scanned for all viruses by the MessageLabs service.
This email has been scanned for all viruses by the MessageLabs service.
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Mike,
This problem is happening on a sparc box.
-Jon
On 11/6/06, Mike Rowell <Mike.Rowell at rightmove.co.uk> wrote:
Henrik,
It might be worth checking to make sure these problems are only on Solaris 10 x86 as that is the only architecture I've seen this problem on, sparc seems fine so might help you in narrowing down the problem.
Regards,
Mike Rowell -----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: 06 November 2006 16:30 To: hobbit at hswn.dk Subject: Re: [hobbit] bbgen frequent yellow alerts - hobbitd problem?
On Mon, Nov 06, 2006 at 07:35:27AM -0800, Mr-Pope wrote:
We are running a new installation of Hobbit 4.2 on Solaris 10 running in a non-global zone. Server is a v240 but I don't think that matters here.
The problem here is that our bbgen status turns yellow with fairly high frequency, sometimes multiple times an hour, at (what seem like) random intervals. In the yellow alert bbgen reports: "hobbitd status-board not available"
The reports I've had of this only have one thing in common: They all happen on Solaris 10. So I'm beginning to suspect that maybe Solaris doesn't work quite the way other systems do.
Or perhaps there is a bug, and something special in Solaris triggers it.
Below are the output from some commands/logs. These logs don't really seem to help, so let me know if there is anything else that I can send along to debug this issue.
$BB --debug $BBDISP "hobbitdboard" (with no --debug on a 'failure' I get no output. I'm assuming this is the same cause of the bbgen yellow alert)
Yes.
bbgen --debug --report (this one turned bbgen yellow/unavailable. Note the quick disconnect.) 2006-11-03 09:51:03 load_state() 2006-11-03 09:51:03 Transport setup is: 2006-11-03 09:51:03 bbdportnumber = 1984 2006-11-03 09:51:03 bbdispproxyhost = NONE 2006-11-03 09:51:03 bbdispproxyport = 0 2006-11-03 09:51:03 Recipient listed as '10.xxx.xxx.xxx' 2006-11-03 09:51:03 Standard BB protocol on port 1984 2006-11-03 09:51:03 Will connect to address 10.xxx.xxx.xxx port 1984 2006-11-03 09:51:03 Connect status is 0 2006-11-03 09:51:03 Sent 126 bytes 2006-11-03 09:51:03 Closing connection
Interesting.
Since it seems that this bites you more than most others, I'd like you to do a couple of things for me to figure out what is going on. I need you to add a couple of debugging lines to Hobbit.
First, in the bbdisplay/loaddata.c file, around line 436 you'll find the code that prints out the "hobbitd status board not available" message. It looks like this: errprintf("hobbitd status-board not available\n"); I want you to change that to errprintf("hobbitd status-board not available, code %d\n", hobbitdresult);
Next, in the lib/sendmsg.c file around line 340 is where the code is that receives data from Hobbit. You'll find these lines:
n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0); if (n > 0) {I'd like you to add 8 lines between these two:
n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0); if (n < 0) { dbgprintf("recv() returned error: %s\n",strerror(errno)); if (errno == EAGAIN) continue; } if (n == 0) { dbgprintf("recv() gave us 0 bytes\n"); continue; } if (n > 0) {
(it isn't the prettiest of programming, but it does the job for now).
After making these two changes, run "make clean; make" and copy the bbdisplay/bbgen binary into your ~hobbit/server/bin/ directory. Let Hobbit run as normal (with --debug on the bbgen command) and when it fails I am very interested to see what's in the logfile.
Regards, Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
This email has been scanned for all viruses by the MessageLabs service.
This email has been scanned for all viruses by the MessageLabs service.
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- A FLASH FLOOD WATCH MEANS FLASH FLOODING IS POSSIBLE IN THE WATCH AREA.
participants (6)
-
greg.hubbard@eds.com
-
henrik@hswn.dk
-
marmstrong5@csc.com
-
Mike.Rowell@Rightmove.co.uk
-
pope8086@gmail.com
-
rdeal@tigr.ORG