This is a somewhat old post, but I'm responding anyway ...
In <AANLkTinFdgiz2ie3NCxhuop8picZj6izZPdH6fESQfif at mail.gmail.com> Steve Holmes <sholmes42 at mac.com> writes:
Please see below, there is a problem with disk monitoring on one of the server. Can some one tell me if I did something wrong?
W]d Jul 28 10:34:31 EDT 2010 - Filesystems NOT ok
7% / (8816628% used) has reached the PANIC level (95%) 38% /u01 (90371708% used) has reached the PANIC level (95%)
Filesystem 10 4-b]ocks Used Available Capacity Mounted on /dev/sda9 9920592 591896 8816628 7% / /dev/sda10 152435112 54195172 90371708 38% /u01 /dev/sda8 9920592 154056 9254468 2% /tmp
It appears that Xymon has slipped one field to the left in parsing the df output. The string at the beginning of each of the lines before the actual df ouput should be the name of the filesystem (plus an icon, but we'll ignore that for now). Then it is using the available number as the percent used, which, of course, is huge.
I don't know if this is causing the problem but there is some funkiness with the first line of the df output. It is broken between the 10 and the 4 and there is a ']' instead of an 'l' in the word "blocks". Maybe this is a cut/paste error, but if not, it is certainly not right.
There is a bug somewhere in the Xymon 4.3.0-beta code with the "df" status handling. I've seen it cause random RRD files to appear for systems that don't have such filesystems, and occasionally it would also result in this behaviour where a disk status goes wild.
I haven't been able to nail it yet, mostly because it seems to happen very rarely and completely without any pattern. It would seem like some sort of memory corruption problem, but I've had the client-message handler running for days with valgrind (memory access checker) enabled, and it came up with nothing.
Very annoying.
Regards, Henrik