i've setup a rule with REPEAT=7d
in the "info" page, i see what i execpt : ping olivier at qalpit.com (R) 2m - 1w - red
but i keep receiving mails every, not a regular basis in notifications.log : Sun Mar 27 11:26:03 2005 Sun Mar 27 11:56:20 2005 Sun Mar 27 12:11:42 2005 up to now
in page.log, i see this : 2005-03-27 11:09:08 Worker process died with exit code 0, terminating 2005-03-27 11:09:08 Could not get shm of size 102400: No such file or directory 2005-03-27 11:09:08 Channel not available 2005-03-27 11:18:54 Worker process died with exit code 0, terminating 2005-03-27 11:18:54 Could not get shm of size 102400: No such file or directory 2005-03-27 11:18:54 Channel not available 2005-03-27 11:48:39 Worker process died with exit code 0, terminating 2005-03-27 12:01:01 Worker process died with exit code 0, terminating
should i restart hobbit, to clean up all ipc ?..
-- Olivier Beau
On Sun, Mar 27, 2005 at 12:31:12PM +0200, olivier at qalpit.com wrote:
in page.log, i see this : 2005-03-27 11:09:08 Worker process died with exit code 0, terminating 2005-03-27 11:09:08 Could not get shm of size 102400: No such file or directory 2005-03-27 11:09:08 Channel not available 2005-03-27 11:18:54 Worker process died with exit code 0, terminating 2005-03-27 11:18:54 Could not get shm of size 102400: No such file or directory 2005-03-27 11:18:54 Channel not available 2005-03-27 11:48:39 Worker process died with exit code 0, terminating 2005-03-27 12:01:01 Worker process died with exit code 0, terminating
Your hobbitd_alert proces dies for some reason, and when restarting it has forgotten about when is the next time to send out an alert.
So why does it die ... the only reason I can come up with is that it catches a signal from a child-process. Could you try changing line 332 of hobbitd/hobbitd_alert.c from sigaction(SIGPIPE, &sa, NULL); to signal(SIGPIPE, SIG_IGN);
and let me know if that makes it keep on running ? If it does, then the mail program that is launched to send the alerts does something weird with it's I/O.
Henrik
Your hobbitd_alert proces dies for some reason, and when restarting it has forgotten about when is the next time to send out an alert.
So why does it die ... the only reason I can come up with is that it catches a signal from a child-process. Could you try changing line 332 of hobbitd/hobbitd_alert.c from sigaction(SIGPIPE, &sa, NULL); to signal(SIGPIPE, SIG_IGN);
and let me know if that makes it keep on running ? If it does, then the mail program that is launched to send the alerts does something weird with it's I/O.
i've changed the code, and it keeps doing it in page.log :
2005-03-27 15:27:43 Worker process died with exit code 0, terminating 2005-03-27 15:27:43 Could not get shm of size 102400: No such file or directory 2005-03-27 15:27:43 Channel not available 2005-03-27 15:33:43 Worker process died with exit code 0, terminating 2005-03-27 15:33:43 Could not get shm of size 102400: No such file or directory 2005-03-27 15:33:43 Channel not available 2005-03-27 22:55:21 Worker process died with exit code 0, terminating 2005-03-27 22:58:15 Worker process died with exit code 0, terminating 2005-03-27 22:58:15 Could not get shm of size 102400: No such file or directory 2005-03-27 22:58:15 Channel not available 2005-03-27 23:46:48 Worker process died with exit code 0, terminating 2005-03-27 23:46:48 Could not get shm of size 102400: No such file or directory 2005-03-27 23:46:48 Channel not available 2005-03-28 00:08:06 Worker process died with exit code 0, terminating 2005-03-28 00:08:07 Could not get shm of size 102400: No such file or directory 2005-03-28 00:08:07 Channel not available
i've been sending alert using a script, so maybe it's crummy.. i've changes to just sending mail and will let you know if it still have happens
btw, i've just realized that a rule was using a macro that didn't exist... i dont think that a problem ..?
in the enadis.log (which i suppose is enable/disable) i got those too : 2005-03-27 15:27:43 Worker process died with exit code 0, terminating 2005-03-27 15:27:43 Could not get shm of size 102400: No such file or directory 2005-03-27 15:27:43 Channel not available 2005-03-27 19:35:17 Worker process died with exit code 0, terminating 2005-03-27 19:35:17 Could not get shm of size 102400: No such file or directory 2005-03-27 19:35:17 Channel not available
I was not playing with maintenance (thow i do have a couple DOWNTIME in bb-host..), what could be going on here ?
-- olivier
participants (2)
-
henrik@hswn.dk
-
olivier@qalpit.com