False red alerts for disk
Hi,
Running Xymon 4.3.0-0.beta2, I sometimes gets false red alerts from disk on a few servers (One of the servers is the xymon server itself).
Usually disk status is reported green, as this:
Wed Jun 10 16:29:17 CEST 2009 - Filesystems OK
Filesystem 1024-blocks Used Available Capacity Mounted on /dev/sda1 204603376 1616748 192593380 1% /
But occasionally, I get red alerts, like this:
- Filesystems NOT ok
red 192593256 1% / (1616872% used) has reached the PANIC level (95%)
Filesystem 1024-blocks Use] Available Capacity Mounted on /dev/sda1 204603376 1616872 192593256 1% /
Somehow the parsing of the client data doesn't work right, resulting the disk blocks being interpreted as percent used.
The corresponding df part in the actual client report looks like this:
[df] Filesystem 1024-blocks Used Available Capacity Mounted on /dev/sda1 204603376 1616872 192593256 1% /
On another server, the false red alert looks like this: Wed Jun 10 15:51:53 CEST 2009 - Filesystems NOT ok
red 44% / (2778580% used) has reached the PANIC level (95%) red 6% /home (2167204% used) has reached the PANIC level (95%)
Filesystem 1 24-]locks Used Available Capacity Mounted on /dev/xvda2 5162828 2121988 2778580 44% / /dev/xvda3 24 7244 ] 136744 2167204 6% /home
While it usually looks like this: Wed Jun 10 15:56:54 CEST 2009 - Filesystems OK
Filesystem 1024-blocks Used Available Capacity Mounted on /dev/xvda2 5162828 2122012 2778556 44% / /dev/xvda3 2427244 136784 2167164 6% /home
Slightly different, but once again, blocks used being interpreted as percentage used.
Anyone has an idea of what might be causing this?
Thanks,
Patrik Nilsson
I am now also seeing this with memory reports. There seem to be a general but intermittent parsing error of client data.
T 2009][uname] Linux tc1.jalbum.net 2.6.18-92.1 22.el5xen ]86_64 - Memory CRITICAL Memory Used Total Percentage red Physical 48576M 1M 4857600% red Actual 819M 1M 81900% green Swap 80M 1983M 4%
Notice the messed up brackets.
The corresponing part of the actual client data reported is:
client tc1,hostnamechanged,net.linux linux [date] Thu Jun 11 11:31:36 CEST 2009 [uname] Linux tc1.hostnamechanged.net 2.6.18-92.1.22.el5xen x86_64 [osversion] CentOS release 5.2 (Final) [uptime] 11:31:36 up 26 days, 22:25, 1 user, load average: 0.12, 0.10, 0.03 [who] root xvc0 May 15 13:09 [df] Filesystem 1024-blocks Used Available Capacity Mounted on /dev/mapper/VolGroup00-LogVol00 10102072 5553636 4131628 58% / /dev/xvda1 101086 20724 75143 22% /boot [mount] /dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/xvda1 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) 192.168.8.8:/mnt/share on /share type nfs (rw,addr=192.168.8.8) [free] total used free shared buffers cached Mem: 1048576 1043172 5404 0 1936 201892 -/+ buffers/cache: 839344 209232 Swap: 2031608 82368 1949240 [ifconfig]
Patrik
On Wed, Jun 10, 2009 at 4:57 PM, Patrik Nilsson<patrik at jalbum.net> wrote:
Hi,
Running Xymon 4.3.0-0.beta2, I sometimes gets false red alerts from disk on a few servers (One of the servers is the xymon server itself).
Usually disk status is reported green, as this:
Wed Jun 10 16:29:17 CEST 2009 - Filesystems OK
Filesystem 1024-blocks Used Available Capacity Mounted on /dev/sda1 204603376 1616748 192593380 1% /
But occasionally, I get red alerts, like this:
- Filesystems NOT ok
red 192593256 1% / (1616872% used) has reached the PANIC level (95%)
Filesystem 1024-blocks Use] Available Capacity Mounted on /dev/sda1 204603376 1616872 192593256 1% /
Somehow the parsing of the client data doesn't work right, resulting the disk blocks being interpreted as percent used.
The corresponding df part in the actual client report looks like this:
[df] Filesystem 1024-blocks Used Available Capacity Mounted on /dev/sda1 204603376 1616872 192593256 1% /
On another server, the false red alert looks like this: Wed Jun 10 15:51:53 CEST 2009 - Filesystems NOT ok
red 44% / (2778580% used) has reached the PANIC level (95%) red 6% /home (2167204% used) has reached the PANIC level (95%)
Filesystem 1 24-]locks Used Available Capacity Mounted on /dev/xvda2 5162828 2121988 2778580 44% / /dev/xvda3 24 7244 ] 136744 2167204 6% /home
While it usually looks like this: Wed Jun 10 15:56:54 CEST 2009 - Filesystems OK
Filesystem 1024-blocks Used Available Capacity Mounted on /dev/xvda2 5162828 2122012 2778556 44% / /dev/xvda3 2427244 136784 2167164 6% /home
Slightly different, but once again, blocks used being interpreted as percentage used.
Anyone has an idea of what might be causing this?
Thanks,
Patrik Nilsson
participants (1)
-
patrik@jalbum.net