On Wednesday, 11 November 2009 22:37:56 j.sansford at ntlworld.com wrote:
We have the same problem - I've even got devmon configured under SMF in Solaris however it doesn't pick up the fact its crashed as the process is still there.
It doesn't crash. As far as I can tell, eventually all the child processes lose communication with the master process, but they are all still running, just waiting for someone to tell them to do something.
A quick and dirty workaround we have is to send an alert on the "dm" monitor going purple - this allows the on-call engineer to be alerted to the fact we are no longer effectively monitoring the network devices and so to restart the process!
There must be a better way though...
Devmon has had "goes purple" problems since 0.2.2 beta. I fixed the more frequent one before the 0.3.0 release.
Anyway, I've done some work on this, however the only production instance of devmon I look at often at present last went purple 9 days ago ...
If you are reproducing more frequently, please have a look at the devmon-devel mailing list (or archives[1] once they have updated), I just sent a mail with an attached patch (against svn, it may apply to the 0.3.1-beta1, haven't tried) that may fix the problem, allow us to narrow it down further, or at least eliminate one aspect as the cause.
Regards, Buchan