Your hobbitd_alert proces dies for some reason, and when restarting it has forgotten about when is the next time to send out an alert.
So why does it die ... the only reason I can come up with is that it catches a signal from a child-process. Could you try changing line 332 of hobbitd/hobbitd_alert.c from sigaction(SIGPIPE, &sa, NULL); to signal(SIGPIPE, SIG_IGN);
and let me know if that makes it keep on running ? If it does, then the mail program that is launched to send the alerts does something weird with it's I/O.
i've changed the code, and it keeps doing it in page.log :
2005-03-27 15:27:43 Worker process died with exit code 0, terminating 2005-03-27 15:27:43 Could not get shm of size 102400: No such file or directory 2005-03-27 15:27:43 Channel not available 2005-03-27 15:33:43 Worker process died with exit code 0, terminating 2005-03-27 15:33:43 Could not get shm of size 102400: No such file or directory 2005-03-27 15:33:43 Channel not available 2005-03-27 22:55:21 Worker process died with exit code 0, terminating 2005-03-27 22:58:15 Worker process died with exit code 0, terminating 2005-03-27 22:58:15 Could not get shm of size 102400: No such file or directory 2005-03-27 22:58:15 Channel not available 2005-03-27 23:46:48 Worker process died with exit code 0, terminating 2005-03-27 23:46:48 Could not get shm of size 102400: No such file or directory 2005-03-27 23:46:48 Channel not available 2005-03-28 00:08:06 Worker process died with exit code 0, terminating 2005-03-28 00:08:07 Could not get shm of size 102400: No such file or directory 2005-03-28 00:08:07 Channel not available
i've been sending alert using a script, so maybe it's crummy.. i've changes to just sending mail and will let you know if it still have happens
btw, i've just realized that a rule was using a macro that didn't exist... i dont think that a problem ..?
in the enadis.log (which i suppose is enable/disable) i got those too : 2005-03-27 15:27:43 Worker process died with exit code 0, terminating 2005-03-27 15:27:43 Could not get shm of size 102400: No such file or directory 2005-03-27 15:27:43 Channel not available 2005-03-27 19:35:17 Worker process died with exit code 0, terminating 2005-03-27 19:35:17 Could not get shm of size 102400: No such file or directory 2005-03-27 19:35:17 Channel not available
I was not playing with maintenance (thow i do have a couple DOWNTIME in bb-host..), what could be going on here ?
-- olivier