On Wed, 2013-07-31 at 17:20 +0100, John Horne wrote:
Hello,
I have just upgraded some client servers from Xymon 4.3.10 to 4.3.12. Two clients now continuously report that a specific process (clamd) is not running. (I have checked both servers and the process is running.)
In the monitoring server analysis.cfg file I have (for both clients):
PROC "% clamd$" TEXT=clamd
This worked fine for 4.3.10, but now seems to fail. We have some other process strings containing 'clamd', so we had to use the above to ensure that we were just monitoring the main 'clamd' process.
Well the code for regular expressions started to get a bit complicated so I thought I would look to see which regex worked and which didn't.
The 'ps' output the Xymon server receives (taken from a data/hostdata file) is, as an example:
6810 1 clam Jul 11 S 21 0.1 00:40:42 11.6 49000 493632 clamd
As can be seen the 'clamd' command is just that, no pathname and no command options. It is preceded by a space, which is why we used the above regex.
For the Xymon 4.3.12 clients I found that using '%^clamd$' worked, and that '% clamd$' did not (note there is a space between the '%' and 'c' characters).
For the Xymon 4.3.10 clients I found the opposite. '%^clamd$' did not work, but '% clamd$' did.
I am, however, confused. As far as I was aware the procs processing is done on the Xymon server, not the client. Our main Xymon server was updated to 4.3.12 yesterday, and all four of our clamd clients showed no errors in the procs column. Today I updated two clients to 4.3.12 and started to get the errors about 'clamd' not running.
So, in effect, we have two 4.3.12 clients using the old regex (' clamd $') and that works. And we have two 4.3.12 clients using the new regex ('^clamd$') and they too work. But if this is all processed on the server, then I would expect two of the clients to be reporting errors. Since that is not happening I can only assume that the client is either doing the processing or communicating something with the main server.
As to what has changed, I can only assume that the 'ps' processing of the client data, when using a regex, works only on the actual command being run and not the whole line from the 'ps' output. Hence '^clamd$' now works (for 4.3.12) and ' clamd$' does not.
John.
-- John Horne, Plymouth University, UK Tel: +44 (0)1752 587287 Fax: +44 (0)1752 587001