Hi,
On Mon, Mar 09, 2015 at 12:44:03PM -0000, SebA wrote:
I have been trying to find out if there is a way of Xymon detecting that a file-system in Linux has gone read-only as a result of a disk error (other than reporting it just the once via monitoring /var/log/messages). Nothing is showing up in my Xymon server, but my xymon-client is a bit old: xymon-client-4.3.7-26.1.el5.tnt
I did a bit of Googling and I came up with these two links that may be relevant: http://sisyphus.ru/en/srpm/Sisyphus/xymon/sources/8 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=764197
It seems that a RPM maintainer may have made some modifications to their version in order to catch disks in a read-only state (in the first link) and that there is mount-ro plugin that is part of the hobbit-plugins package in Debian / Ubuntu. Does anyone have more information on either of these
I'm one of the maintainers of Debian's hobbit-plugins package, so yes. :-)
and whether any patches can be integrated upstream or plug-ins added to xymonton?
I'm not sure where exactly at https://wiki.xymonton.org/ I should add our set of plugins.
CCing Axel Beckert as he seems to have committed something to the mount-ro plugin recently: https://www.openhub.net/p/hobbit-plugins/commits
Hrm, OpenHub seems horribly out of date with most projects recently... The full view on that Git repo is at https://anonscm.debian.org/cgit/collab-maint/hobbit-plugins.git/
The source code of the mount-ro plugin is quite simple: https://anonscm.debian.org/cgit/collab-maint/hobbit-plugins.git/tree/misc.d/...
It's though not a direct plugin but meant for the meta-plugin "misc" which calls all scripts in /etc/xymon/misc.d/ and summarizes their exit codes into a single check. This is meant for checks which get yellow/red only very seldom and where you don't want to waste a whole column for it.
misc plugin: https://anonscm.debian.org/cgit/collab-maint/hobbit-plugins.git/tree/client-...
Hobbit.pm used in the misc plugin and many other plugins in that package: https://anonscm.debian.org/cgit/collab-maint/hobbit-plugins.git/tree/perl/Ho...
The following was at the bottom of /var/log/messages, but it does not suggest any very obvious alarm strings to add other than the last line without the 'dm-0', but it would be nicer to have something more generic still as textual messages can change between different versions of the O/S.
kernel: sd 0:0:0:0: Unhandled sense code kernel: sd 0:0:0:0: SCSI error: return code = 0x08100002 kernel: Result: hostbyte=invalid driverbyte=DRIVER_SENSE,SUGGEST_OK kernel: sda: Current: sense key: Hardware Error kernel: Add. Sense: Defect list error kernel: kernel: Buffer I/O error on device dm-0, logical block 1358756 kernel: lost page write due to I/O error on dm-0
That's probably something which can be caught via the LOG keyword in analysis.cfg.
Kind regards, Axel Beckert
-- Axel Beckert <beckert at phys.ethz.ch> support: +41 44 633 26 68 IT Services Group, HPT H 6 voice: +41 44 633 41 89 Departement of Physics, ETH Zurich CH-8093 Zurich, Switzerland http://nic.phys.ethz.ch/