Hi,
I have been trying to find out if there is a way of Xymon detecting that a file-system in Linux has gone read-only as a result of a disk error (other than reporting it just the once via monitoring /var/log/messages). Nothing is showing up in my Xymon server, but my xymon-client is a bit old: xymon-client-4.3.7-26.1.el5.tnt
I did a bit of Googling and I came up with these two links that may be relevant: http://sisyphus.ru/en/srpm/Sisyphus/xymon/sources/8 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=764197
It seems that a RPM maintainer may have made some modifications to their version in order to catch disks in a read-only state (in the first link) and that there is mount-ro plugin that is part of the hobbit-plugins package in Debian / Ubuntu. Does anyone have more information on either of these and whether any patches can be integrated upstream or plug-ins added to xymonton? CCing Axel Beckert as he seems to have committed something to the mount-ro plugin recently: https://www.openhub.net/p/hobbit-plugins/commits
Although we have some Debian systems, I was looking for a solution for another Linux distro.
If I was to write something myself to do it, I would check /proc/mounts and the best command I could find was: awk '$4~/(^|,)ro($|,)/' /proc/mounts which outputs: /dev/root / ext3 ro,data=ordered 0 0 with sample line: /dev/root / ext3 ro,data=ordered 0 0
This command also produced a nice summary output that might be good to have on a Xymon status page: cat /proc/mounts|sort|awk '{print $1 "\011" toupper(substr($4,0,2)
The following was at the bottom of /var/log/messages, but it does not suggest any very obvious alarm strings to add other than the last line without the 'dm-0', but it would be nicer to have something more generic still as textual messages can change between different versions of the O/S.
kernel: sd 0:0:0:0: Unhandled sense code kernel: sd 0:0:0:0: SCSI error: return code = 0x08100002 kernel: Result: hostbyte=invalid driverbyte=DRIVER_SENSE,SUGGEST_OK kernel: sda: Current: sense key: Hardware Error kernel: Add. Sense: Defect list error kernel: kernel: Buffer I/O error on device dm-0, logical block 1358756 kernel: lost page write due to I/O error on dm-0
Kind regards,
SebA