If have got a question regarding Xymon 4.3.3 (running on CentOS 5.6/x86_64). In order to monitor the existence of certain processes like rsyslogd(8) I have following process rule defined in analysis.cfg:
CLASS=linux PROC "%^/sbin/rsyslogd -m 0$"
This works fine as long as the columns in the output of ps(1) (more specific “ps -Aww -o pid,ppid,user,start,state,pri,pcpu,time,pmem,rsz,vsz,cmd” as defined in xymonclient-linux.sh) are all nicely aligned.
PID PPID USER STARTED S PRI %CPU TIME %MEM RSZ VSZ CMD [...] 4620 4607 68 Jun 17 S 22 0.0 00:00:00 0.0 860 12348 hald-addon-keyboard: listening on /dev/input/event0 4709 1 root Jun 17 S 17 0.0 00:00:00 0.0 496 8540 /usr/bin/hidd --server 4739 1 root Jun 17 S 21 0.0 00:11:14 0.0 3576 300132 /sbin/rsyslogd -m 0 6894 1 root Jun 17 S 18 0.0 00:00:00 0.0 1540 122008 automount 6918 1 root Jun 17 S 24 0.0 00:00:08 0.0 1224 63544 /usr/sbin/sshd
The trouble starts when the process in question runs long enough (as seen on a different machine) so it does fit the reserved columns for that specific field, disturbing to whole output (process runtime is just one example, I suppose any value growing big enough to not fit anymore the reserved space would do to exploit this behavior):
PID PPID USER STARTED S PRI %CPU TIME %MEM RSZ VSZ CMD [...] 5377 1 root May 24 S 21 0.0 00:00:00 0.0 444 3816 /sbin/mingetty tty4 5378 1 root May 24 S 20 0.0 00:00:00 0.0 444 3816 /sbin/mingetty tty5 5380 1 root May 24 S 19 0.0 00:00:00 0.0 444 3816 /sbin/mingetty tty6 5382 1 root May 24 S 22 0.0 00:00:00 0.0 496 3824 /sbin/agetty 9600 ttyS1 vt100 8734 1 root Jun 20 S 21 7.7 3-06:51:29 0.1 48640 292664 /sbin/rsyslogd -m 0 20468 262 root Jul 19 S 24 0.0 00:04:01 0.0 0 0 [pdflush]
In this case the above regex does not seem to match anymore, because (apparently) the matching starts at some fixed column value. Just for fun and to double check I enhanced the process rule set by another rule:
CLASS=linux PROC "%^/sbin/rsyslogd -m 0$" PROC "%^[0-9]+ /sbin/rsyslogd -m 0$"
After doing so, indeed the first rule still fails while the second rule matches. So apparently the last digit of the VSZ field of rsyslogd(8) sneaked into the CMD field and gets matched by the PROC check. Is this a known bug, and if yes is there a good workaround for that apart from invoking a wrapper script in xymonclient-linux.sh which mangels the output of ps(1) accordingly?
Thanks in advance! -cs
On Mon, 01 Aug 2011 20:32:10 +0200, Christoph Schug <cs at schug.net> wrote:
If have got a question regarding Xymon 4.3.3 (running on CentOS 5.6/x86_64). In order to monitor the existence of certain processes like rsyslogd(8) I have following process rule defined in analysis.cfg:
CLASS=linux PROC "%^/sbin/rsyslogd -m 0$" [...]
I was asked off the list (thanks, but honestly I hope the benefit for all of us is higher of discussion keeps on the list):
"Why not just dispense with the '^'? That way the RE will match regardless of where it starts. "
I'd like to have most exact matching on all my processes. rsyslogd(8) is just an example, same applies for example to shell scripts which run for a very long time or as daemon. So I prefer rather
PROC "%^/foo/bar$"
instead of just
PROC "/foo/bar"
or a somehow relaxed regex, because otherwise a local use might have a look at the script using more(1), but I don't want to have the process monitoring matching such thinks like "more /foo/bar". This is reporting wrong numbers, or might even report the check as GREEN while the instance which is intended to run doesn't so anymore.
-cs
Okay, looked more closely on the source. Apparently the column position of fields of the ps(1) output is automatically determined depended on the position of the header. So just to let you know, following patch worked for my setup: --- xymon-4.3.4/client/xymonclient-linux.sh.orig 2011-07-31 23:01:52.000000000 +0200 +++ xymon-4.3.4/client/xymonclient-linux.sh 2011-08-02 13:40:14.000000000 +0200 @@ -68,7 +68,7 @@ # Report mdstat data if it exists if test -r /proc/mdstat; then echo "[mdstat]"; cat /proc/mdstat; fi echo "[ps]" -ps -Aww -o pid,ppid,user,start,state,pri,pcpu,time,pmem,rsz,vsz,cmd +ps -Aww -o pid,ppid,user,start,state,pri,pcpu,time:12,pmem,rsz:10,vsz:10,cmd # $TOP must be set, the install utility should do that for us if it exists. if test "$TOP" != "" Cheers -cs
participants (1)
-
cs@schug.net