The trouble is the OS blocks on the i/o request while waiting for a response from the missing nfs server - the xymon client is not aware of 'real time' to be able to detect it has been blocked waiting for a return from another program.
For these types of errors I create an ext script that uses expect. In the expect script you spawn the test, and with expect's timeout capability you can then alarm on the timeout.
From: Xymon <xymon-bounces at xymon.com> on behalf of Cédric BRINER <Cedric.BRINER at UniGE.ch> Sent: Friday, 19 June 2015 5:54 PM To: Mark Felder; xymon at xymon.com Subject: Re: [Xymon] df hanging will cause xymon-client to hang.
The error happend due to a nfs ressource not responding. I suppose that as xymon launch "df" to get information and as the nfs was hanging, the xymon-client is no able to detect that the df hangs and it does not send any data to the server. Worst, the xymon-client is not verbose at all, it does not tell this test (df) takes too long to accomplish.
What I found very disturbing is that there is no logs at all saying that df takes long to accomplish. Instead of finding a way to solve this xymon-client hanging out, could we somehow let xymon-client write a message on the log, saying that df did not return since a long time ?
cED
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon