"hobbitd status-board not available" from bbgen on solaris 10
If anyone has been having issues with bbgen logging this error mesage on Solaris 10 and intermittently failing, resulting in blank status pages, then I think I have found a workaround.
If you disable TCP fusion be adding the following kernel parameter to /etc/system and reboot, hopefully you will find that the problem goes away.
set ip:do_tcp_fusion = 0
Apparently this can be done on a live system as well (without rebooting), but will require hobbit to be restarted. To do this:
echo do_tcp_fusion/W0 | mdb -kw
TCP fusion is only used on local loopback connections to speed them up by bypassing the normal TCP stack. I found that the problem only occured when connecting to hobbitd locally. I tried running "bb localhost hobbitdboard" once a second, and found it would often return no data, but if I ran the same command from another host to the hobbit server, it always returned correct data. This made me suspect TCP fusion, as I have run into issues with it before. It it is best left disabled in my opinion.
On Thu, Apr 19, 2007 at 12:30:23PM +0100, Colin Spargo wrote:
TCP fusion is only used on local loopback connections to speed them up by bypassing the normal TCP stack. I found that the problem only occured when connecting to hobbitd locally. I tried running "bb localhost hobbitdboard" once a second, and found it would often return no data, but if I ran the same command from another host to the hobbit server, it always returned correct data. This made me suspect TCP fusion, as I have run into issues with it before. It it is best left disabled in my opinion.
Very interesting, thanks. There have been some reports about problems on Solaris 10, and at one point I was suspecting an OS bug. Seems I was right.
Henrik
- stop hobbit server
- zero out the existing log file
- apply the online fix
- So far so good, I can confirm the status-board error message is now gone ;)
bash-3.00# grep -i status-board *.log bash-3.00# pwd /var/opt/hobbitserver42/log bash-3.00# ls *.log acknowledge.log cgierror.log hobbitlaunch.log rrd-data.log bb-display.log clientdata.log hobbitlaunch.pid rrd-status.log bb-network.log history.log hostdata.log bb-retest.log hobbitd.log notifications.log bbcombotest.log hobbitd.pid page.log bash-3.00# cat /etc/release Solaris 10 6/06 s10s_u2wos_09a SPARC Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 09 June 2006 bash-3.00#
Good job on track down the cause on providing the fix.
T.J. Yang
From: Colin Spargo <cspargo2 at csc.com> Reply-To: hobbit at hswn.dk To: hobbit at hswn.dk Subject: [hobbit] "hobbitd status-board not available" from bbgen on solaris 10 Date: Thu, 19 Apr 2007 12:30:23 +0100
If anyone has been having issues with bbgen logging this error mesage on Solaris 10 and intermittently failing, resulting in blank status pages, then I think I have found a workaround.
If you disable TCP fusion be adding the following kernel parameter to /etc/system and reboot, hopefully you will find that the problem goes away.
set ip:do_tcp_fusion = 0
Apparently this can be done on a live system as well (without rebooting), but will require hobbit to be restarted. To do this:
echo do_tcp_fusion/W0 | mdb -kw
TCP fusion is only used on local loopback connections to speed them up by bypassing the normal TCP stack. I found that the problem only occured when connecting to hobbitd locally. I tried running "bb localhost hobbitdboard" once a second, and found it would often return no data, but if I ran the same command from another host to the hobbit server, it always returned correct data. This made me suspect TCP fusion, as I have run into issues with it before. It it is best left disabled in my opinion.
MSN is giving away a trip to Vegas to see Elton John.� Enter to win today. http://msnconcertcontest.com?icid-nceltontagline
Good to hear!
A trawl through sunsolve shows a few bugs that may have something to do with it:
Bug ID: 6458410 Synopsis: read() may spuriously return EAGAIN while unfusing a TCP connection
No patch for this yet I believe.
This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
"T.J. Yang" <tj_yang at hotmail.com> 19/04/2007 17:46 Please respond to hobbit at hswn.dk
To hobbit at hswn.dk cc
Subject RE: [hobbit] "hobbitd status-board not available" from bbgen on solaris 10
- stop hobbit server
- zero out the existing log file
- apply the online fix
- So far so good, I can confirm the status-board error message is now gone ;)
bash-3.00# grep -i status-board *.log bash-3.00# pwd /var/opt/hobbitserver42/log bash-3.00# ls *.log acknowledge.log cgierror.log hobbitlaunch.log rrd-data.log bb-display.log clientdata.log hobbitlaunch.pid rrd-status.log bb-network.log history.log hostdata.log bb-retest.log hobbitd.log notifications.log bbcombotest.log hobbitd.pid page.log bash-3.00# cat /etc/release Solaris 10 6/06 s10s_u2wos_09a SPARC Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 09 June 2006 bash-3.00#
Good job on track down the cause on providing the fix.
T.J. Yang
From: Colin Spargo <cspargo2 at csc.com> Reply-To: hobbit at hswn.dk To: hobbit at hswn.dk Subject: [hobbit] "hobbitd status-board not available" from bbgen on solaris 10 Date: Thu, 19 Apr 2007 12:30:23 +0100
If anyone has been having issues with bbgen logging this error mesage on Solaris 10 and intermittently failing, resulting in blank status pages, then I think I have found a workaround.
If you disable TCP fusion be adding the following kernel parameter to /etc/system and reboot, hopefully you will find that the problem goes away.
set ip:do_tcp_fusion = 0
Apparently this can be done on a live system as well (without rebooting), but will require hobbit to be restarted. To do this:
echo do_tcp_fusion/W0 | mdb -kw
TCP fusion is only used on local loopback connections to speed them up by bypassing the normal TCP stack. I found that the problem only occured when connecting to hobbitd locally. I tried running "bb localhost hobbitdboard" once a second, and found it would often return no data, but if I ran the same command from another host to the hobbit server, it always returned correct data. This made me suspect TCP fusion, as I have run into issues with it before. It it is best left disabled in my opinion.
MSN is giving away a trip to Vegas to see Elton John. Enter to win today.
http://msnconcertcontest.com?icid-nceltontagline
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
participants (3)
-
cspargo2@csc.com
-
henrik@hswn.dk
-
tj_yang@hotmail.com