On Wed, Jan 26, 2005 at 03:17:01PM -0700, Charles Jones wrote:
My production BigBrother server is running BigBrother + bbgen 2.5 (I know there is newer bbgen, I plan on replacing BB with a Hobbit server).
Wow, that's a pretty old bbgen version - 1œ years, in fact.
My current bb+bbgen setup has problems whenever a machine dies in such a way that it is pingable, but when you connect to any open TCP port you get nothing back (usually caused by a memory error or overheating). When my current bb+bbgen setup tries to test one of these machines that has zombified, it gets hung testing that host, and eventually everything turns purple since bb isn't updating anymore.
Does Hobbit have proper timeouts to timeout a hung TCP connection so this sort of thing does not happen?
If not, then it's definitely a bug. All network tests done by Hobbit must timeout if the other end doesn't respond. The default timeout is 10 seconds (set with the "--timeout=N" option to bbtest-net).
Looking back through the bbgen changelog, there are a couple of bugfixes through the 2.x series that seem likely to fix it. But without knowing exactly what's triggering this behaviour it is hard to say for sure.
Henrik