increasing no. of hobbitd zombie's
Hi,
every "checkpoint-interval" i get a new hobbitd zombie process.
#> ps auxwww ----snip --- hobbit 25559 0.0 0.0 0 0 ? Z 11:06 0:00 [hobbitd] <defunct> hobbit 25917 0.0 0.0 0 0 ? Z 11:16 0:00 [hobbitd] <defunct> hobbit 26283 0.0 0.0 0 0 ? Z 11:26 0:00 [hobbitd] <defunct> hobbit 26648 0.0 0.0 0 0 ? Z 11:36 0:00 [hobbitd] <defunct> ----snip ---
hobbitlaunch.cfg: [hobbitd] section:
CMD hobbitd --debug --pidfile=$BBSERVERLOGS/hobbitd.pid --restart=$BBTMP/hobbitd.chk --checkpoint-file=$BBTMP/hobbitd.chk --checkpoint-interval=600 --log=$BBSERVERLOGS/hobbitd.log --admin-senders=127.0.0.1,$BBSERVERIP
... the last zombie appears at 11:36 ..., this is the hobbitd.log (debug mode) - look's good, i think ;-)
... 2005-10-25 11:36:11 Sending heartbeat to pid 25202 2005-10-25 11:36:17 Sending heartbeat to pid 25202 2005-10-25 11:36:17 -> check_purple_status 2005-10-25 11:36:17 <- check_purple_status 2005-10-25 11:36:17 -> generate_stats 2005-10-25 11:36:17 <- generate_stats 2005-10-25 11:36:17 -> get_hts 2005-10-25 11:36:17 <- get_hts 2005-10-25 11:36:17 ->handle_status 2005-10-25 11:36:17 posting to status channel 2005-10-25 11:36:17 -> posttochannel 2005-10-25 11:36:17 Dropping message - no readers 2005-10-25 11:36:17 <-handle_status 2005-10-25 11:36:23 Sending heartbeat to pid 25202 2005-10-25 11:36:25 -> save_checkpoint 2005-10-25 11:36:25 <- save_checkpoint 2005-10-25 11:36:29 Sending heartbeat to pid 25202 2005-10-25 11:36:35 Sending heartbeat to pid 25202 ...
This happens on Debian LINUX 3.1 'Sarge'. Analog config on different Solaris 8 SPARC Boxes => no problem.
On the Debian box, i have DISABLED bbdisplay in hobbitlaunch.cfg, because this box should only act as a kind of LAN probe (only bb-net and forwarding of LAN client stati to a central bbdisplay) No big issue (my next step: increase the checkpoint interval ;-) - but maybe someone else may run into trouble.
:-) Michael
Michael Heinecke CAX / UNIX&VoD
HanseNet Telekommunikation GmbH Überseering 33 a, 22297 Hamburg Telefon: +49 (0)40 23726-2768 Telefax: +49 (0)40 23726-3485
<http://www.alice-dsl.de/> http://www.alice-dsl.de, <http://www.hansenet.de/> http://www.hansenet.de
On Tue, Oct 25, 2005 at 11:57:57AM +0200, Heinecke at hansenet.com wrote:
Hi,
every "checkpoint-interval" i get a new hobbitd zombie process.
#> ps auxwww ----snip --- hobbit 25559 0.0 0.0 0 0 ? Z 11:06 0:00 [hobbitd] <defunct> hobbit 25917 0.0 0.0 0 0 ? Z 11:16 0:00 [hobbitd] <defunct> hobbit 26283 0.0 0.0 0 0 ? Z 11:26 0:00 [hobbitd] <defunct> hobbit 26648 0.0 0.0 0 0 ? Z 11:36 0:00 [hobbitd] <defunct>
You're right that it is related to the checkpoint'ing - hobbitd forks a child process to save the checkpoint file.
What I don't understand is why it isn't cleaned up afterwards. Could you do a "ps -lw -u hobbit" ? I'm curious to see what the PPID is for these zombies.
This happens on Debian LINUX 3.1 'Sarge'. Analog config on different Solaris 8 SPARC Boxes => no problem.
On the Debian box, i have DISABLED bbdisplay in hobbitlaunch.cfg, because this box should only act as a kind of LAN probe (only bb-net and forwarding of LAN client stati to a central bbdisplay)
In that case you don't need hobbitd running at all.
Hmm - perhaps this happens because there are no messages sent to this hobbitd instance. I think that's the cause - looking over the code it seems that if no messages arrive, the code to clean up the child processes is never reached.
The attached patch should fix it, although it is of course a non-issue if you stop hobbitd on this box.
Regards, Henrik
participants (2)
-
Heinecke@hansenet.com
-
henrik@hswn.dk