Detecting read-only file system in Linux
Hi,
I have been trying to find out if there is a way of Xymon detecting that a file-system in Linux has gone read-only as a result of a disk error (other than reporting it just the once via monitoring /var/log/messages). Nothing is showing up in my Xymon server, but my xymon-client is a bit old: xymon-client-4.3.7-26.1.el5.tnt
I did a bit of Googling and I came up with these two links that may be relevant: http://sisyphus.ru/en/srpm/Sisyphus/xymon/sources/8 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=764197
It seems that a RPM maintainer may have made some modifications to their version in order to catch disks in a read-only state (in the first link) and that there is mount-ro plugin that is part of the hobbit-plugins package in Debian / Ubuntu. Does anyone have more information on either of these and whether any patches can be integrated upstream or plug-ins added to xymonton? CCing Axel Beckert as he seems to have committed something to the mount-ro plugin recently: https://www.openhub.net/p/hobbit-plugins/commits
Although we have some Debian systems, I was looking for a solution for another Linux distro.
If I was to write something myself to do it, I would check /proc/mounts and the best command I could find was: awk '$4~/(^|,)ro($|,)/' /proc/mounts which outputs: /dev/root / ext3 ro,data=ordered 0 0 with sample line: /dev/root / ext3 ro,data=ordered 0 0
This command also produced a nice summary output that might be good to have on a Xymon status page: cat /proc/mounts|sort|awk '{print $1 "\011" toupper(substr($4,0,2)
The following was at the bottom of /var/log/messages, but it does not suggest any very obvious alarm strings to add other than the last line without the 'dm-0', but it would be nicer to have something more generic still as textual messages can change between different versions of the O/S.
kernel: sd 0:0:0:0: Unhandled sense code kernel: sd 0:0:0:0: SCSI error: return code = 0x08100002 kernel: Result: hostbyte=invalid driverbyte=DRIVER_SENSE,SUGGEST_OK kernel: sda: Current: sense key: Hardware Error kernel: Add. Sense: Defect list error kernel: kernel: Buffer I/O error on device dm-0, logical block 1358756 kernel: lost page write due to I/O error on dm-0
Kind regards,
SebA
Hi,
On Mon, Mar 09, 2015 at 12:44:03PM -0000, SebA wrote:
I have been trying to find out if there is a way of Xymon detecting that a file-system in Linux has gone read-only as a result of a disk error (other than reporting it just the once via monitoring /var/log/messages). Nothing is showing up in my Xymon server, but my xymon-client is a bit old: xymon-client-4.3.7-26.1.el5.tnt
I did a bit of Googling and I came up with these two links that may be relevant: http://sisyphus.ru/en/srpm/Sisyphus/xymon/sources/8 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=764197
It seems that a RPM maintainer may have made some modifications to their version in order to catch disks in a read-only state (in the first link) and that there is mount-ro plugin that is part of the hobbit-plugins package in Debian / Ubuntu. Does anyone have more information on either of these
I'm one of the maintainers of Debian's hobbit-plugins package, so yes. :-)
and whether any patches can be integrated upstream or plug-ins added to xymonton?
I'm not sure where exactly at https://wiki.xymonton.org/ I should add our set of plugins.
CCing Axel Beckert as he seems to have committed something to the mount-ro plugin recently: https://www.openhub.net/p/hobbit-plugins/commits
Hrm, OpenHub seems horribly out of date with most projects recently... The full view on that Git repo is at https://anonscm.debian.org/cgit/collab-maint/hobbit-plugins.git/
The source code of the mount-ro plugin is quite simple: https://anonscm.debian.org/cgit/collab-maint/hobbit-plugins.git/tree/misc.d/...
It's though not a direct plugin but meant for the meta-plugin "misc" which calls all scripts in /etc/xymon/misc.d/ and summarizes their exit codes into a single check. This is meant for checks which get yellow/red only very seldom and where you don't want to waste a whole column for it.
misc plugin: https://anonscm.debian.org/cgit/collab-maint/hobbit-plugins.git/tree/client-...
Hobbit.pm used in the misc plugin and many other plugins in that package: https://anonscm.debian.org/cgit/collab-maint/hobbit-plugins.git/tree/perl/Ho...
The following was at the bottom of /var/log/messages, but it does not suggest any very obvious alarm strings to add other than the last line without the 'dm-0', but it would be nicer to have something more generic still as textual messages can change between different versions of the O/S.
kernel: sd 0:0:0:0: Unhandled sense code kernel: sd 0:0:0:0: SCSI error: return code = 0x08100002 kernel: Result: hostbyte=invalid driverbyte=DRIVER_SENSE,SUGGEST_OK kernel: sda: Current: sense key: Hardware Error kernel: Add. Sense: Defect list error kernel: kernel: Buffer I/O error on device dm-0, logical block 1358756 kernel: lost page write due to I/O error on dm-0
That's probably something which can be caught via the LOG keyword in analysis.cfg.
Kind regards, Axel Beckert
-- Axel Beckert <beckert at phys.ethz.ch> support: +41 44 633 26 68 IT Services Group, HPT H 6 voice: +41 44 633 41 89 Departement of Physics, ETH Zurich CH-8093 Zurich, Switzerland http://nic.phys.ethz.ch/
-----Original Message----- From: Axel Beckert [mailto:beckert at phys.ethz.ch] Sent: 09 March 2015 13:14 To: SebA Cc: xymon at xymon.com Subject: Re: Detecting read-only file system in Linux
Hi,
On Mon, Mar 09, 2015 at 12:44:03PM -0000, SebA wrote:
I have been trying to find out if there is a way of Xymon detecting that a file-system in Linux has gone read-only as a result of a disk error (other than reporting it just the once via monitoring /var/log/messages). Nothing is showing up in my Xymon server, but my xymon-client is a bit old: xymon-client-4.3.7-26.1.el5.tnt
I did a bit of Googling and I came up with these two links that may be relevant: http://sisyphus.ru/en/srpm/Sisyphus/xymon/sources/8 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=764197
It seems that a RPM maintainer may have made some modifications to their version in order to catch disks in a read-only state (in the first link) and that there is mount-ro plugin that is part of the hobbit-plugins package in Debian / Ubuntu. Does anyone have more information on either of these
I'm one of the maintainers of Debian's hobbit-plugins package, so yes. :-)
Great, I thought as much!
and whether any patches can be integrated upstream or plug-ins added to xymonton?
I'm not sure where exactly at https://wiki.xymonton.org/ I should add our set of plugins.
I'm not an expert, but I would have thought under monitors - that is where most of the existing 'things' are, and I expect that most of 'your' plugins are monitors? https://wiki.xymonton.org/doku.php/monitors
CCing Axel Beckert as he seems to have committed something to the mount-ro plugin recently: https://www.openhub.net/p/hobbit-plugins/commits
Hrm, OpenHub seems horribly out of date with most projects recently... The full view on that Git repo is at https://anonscm.debian.org/cgit/collab-maint/hobbit-plugins.git/
The source code of the mount-ro plugin is quite simple: https://anonscm.debian.org/cgit/collab-maint/hobbit-plugins.gi t/tree/misc.d/mount-ro
It's though not a direct plugin but meant for the meta-plugin "misc" which calls all scripts in /etc/xymon/misc.d/ and summarizes their exit codes into a single check. This is meant for checks which get yellow/red only very seldom and where you don't want to waste a whole column for it.
misc plugin: https://anonscm.debian.org/cgit/collab-maint/hobbit-plugins.gi t/tree/client-ext/misc
Hobbit.pm used in the misc plugin and many other plugins in that package: https://anonscm.debian.org/cgit/collab-maint/hobbit-plugins.gi t/tree/perl/Hobbit.pm
Thanks very much Axel: this should enable me to get these working on non-Debian. Are there any docs in the Git repo (or elsewhere)?
Kind regards, Axel Beckert-- Axel Beckert <beckert at phys.ethz.ch> support: +41 44 633 26 68 IT Services Group, HPT H 6 voice: +41 44 633 41 89 Departement of Physics, ETH Zurich CH-8093 Zurich, Switzerland http://nic.phys.ethz.ch/
Kind regards,
Sebastian
Hi,
On 09 Mar 2015, at 13:44, SebA <spah at syntec.co.uk> wrote:
(…)
Although we have some Debian systems, I was looking for a solution for another Linux distro.
If I was to write something myself to do it, I would check /proc/mounts and the best command I could find was: awk '$4~/(^|,)ro($|,)/' /proc/mounts which outputs: /dev/root / ext3 ro,data=ordered 0 0 with sample line: /dev/root / ext3 ro,data=ordered 0 0
You could use the reported “Client data” / “clientlog” with a server-side extension. I have to admit that I’m not 100% sure if error-or-remounts are reflected properly by this — but mount seems to use /proc/self/mountinfo as it’s datasource, so it _should_ be ok.
Extract the reported mount-data for host bb.local:
xymon 127.0.0.1 “clientlog bb.local section=mount”
Some testing of the output:
(export host=bb.local; xymon 127.0.0.1:1984 "clientlog $host section=mount" | gawk '/[(,]+ro[,)]+/ { print "filesystem " $1 " mounted RO!" } ')
This would not require installing and maintaining an extension on every client. One drawback would be increased latency as the ro-check would be “indirect”.
Cheers Thomas
Thanks Thomas. Simulating this suggests your code will work on our server, but as you said, you are not 100% sure that the 'ro' will surface in the mount command. The output here suggests that it may not: http://sisyphus.ru/en/srpm/Sisyphus/xymon/sources/8 #!/bin/sh
Read data from /proc/mounts on linux and report back in the xymon client
It gives more accurate data than the 'mount' command and can catch
disks in a read-only state.
test -r /proc/mounts && exec cat /proc/mounts But maybe that is only needed for some other version of linux... Our server is now fixed so I can't test it properly. But I do prefer a server-side extension, or even a patch to xymon-server, to client plugins that I have to install to every server. The issue with your code is that it only works for 1 named server ('bb.local'), so I would need to have something iterating over all hosts. Probably not hard, and probably something that someone here already does...
In fact, I think I have found the place to add it and that is in this add-on: https://wiki.xymonton.org/doku.php/monitors:check-client - then it will work for all hosts (correct me if I am wrong David Baldwin?)
Kind regards,
SebA
From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of Thomas Eckert Sent: 09 March 2015 14:05 To: Xymon MailingList Subject: Re: [Xymon] Detecting read-only file system in Linux
Hi,
On 09 Mar 2015, at 13:44, SebA <spah at syntec.co.uk> wrote:
(.)
Although we have some Debian systems, I was looking for a solution for another Linux distro.
If I was to write something myself to do it, I would check /proc/mounts and the best command I could find was: awk '$4~/(^|,)ro($|,)/' /proc/mounts which outputs: /dev/root / ext3 ro,data=ordered 0 0 with sample line: /dev/root / ext3 ro,data=ordered 0 0
You could use the reported "Client data" / "clientlog" with a server-side
extension. I have to admit that I'm not 100% sure if error-or-remounts are
reflected properly by this - but mount seems to use /proc/self/mountinfo
as it's datasource, so it _should_ be ok.
Extract the reported mount-data for host bb.local:
xymon 127.0.0.1 "clientlog bb.local section=mount"
Some testing of the output:
(export host=bb.local; xymon 127.0.0.1:1984 "clientlog $host section=mount" | gawk '/[(,]+ro[,)]+/ { print "filesystem " $1 " mounted RO!" } ')
This would not require installing and maintaining an extension on every client. One drawback would be increased latency as the ro-check would be "indirect".
Cheers Thomas
On 11 March 2015 at 00:44, Thomas Eckert <thomas.eckert at it-eckert.de> wrote:
Just for future reference as you already found your solution:
From looking at the util-linux package I understand that _if_ the /proc-filesystem is mounted the information is used from there.
I don't think this is the case. Running "strace" on the "mount" command shows that it first looks at /etc/mtab, and if that exists, it doesn't look at /proc/mounts. Only if /etc/mtab doesn't exist, does mount look at /proc/mounts.
So in a situation where the filesystem containing /etc is having problems and goes read-only, /etc/mtab won't get updated because it's now read-only.
J
On 11 Mar 2015, at 00:58, Jeremy Laidman <jlaidman at rebel-it.com.au> wrote:
On 11 March 2015 at 00:44, Thomas Eckert <thomas.eckert at it-eckert.de <mailto:thomas.eckert at it-eckert.de>> wrote: Just for future reference as you already found your solution:
From looking at the util-linux package I understand that _if_ the /proc-filesystem is mounted the information is used from there.
I don't think this is the case. Running "strace" on the "mount" command shows that it first looks at /etc/mtab, and if that exists, it doesn't look at /proc/mounts. Only if /etc/mtab doesn't exist, does mount look at /proc/mounts.
Strange, I did also check with strace and did not see /etc/mtab used. This is on Debian 7.x, Debian kernel 3.2.0-4-amd64 and util-linux 2.20.1-5.3
root at bb:~# strace -e trace=open mount 2>&1 | egrep 'mtab|proc'
open("/proc/filesystems", O_RDONLY) = 3
open("/proc/self/mountinfo", O_RDONLY) = 3
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
So in a situation where the filesystem containing /etc is having problems and goes read-only, /etc/mtab won't get updated because it's now read-only.
Agreed.
So re-using the mount-info for the r/o-check is at least highly dependent on the individual environment and versions.
Cheers Thomas
From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of Thomas Eckert Sent: 11 March 2015 07:42 Subject: Re: [Xymon] Detecting read-only file system in Linux
On 11 Mar 2015, at 00:58, Jeremy Laidman <jlaidman at rebel-it.com.au> wrote:
On 11 March 2015 at 00:44, Thomas Eckert <thomas.eckert at it-eckert.de> wrote:
Just for future reference as you already found your solution:
From looking at the util-linux package I understand that _if_ the /proc-filesystem is mounted the information is used from there.
I don't think this is the case. Running "strace" on the "mount" command shows that it first looks at /etc/mtab, and if that exists, it doesn't look at /proc/mounts. Only if /etc/mtab doesn't exist, does mount look at /proc/mounts.
Strange, I did also check with strace and did not see
/etc/mtabused. This is on Debian 7.x, Debian kernel 3.2.0-4-amd64 and util-linux 2.20.1-5.3
root at bb:~# strace -e trace=open mount 2>&1 | egrep 'mtab|proc' open("/proc/filesystems", O_RDONLY) = 3 open("/proc/self/mountinfo", O_RDONLY) = 3 proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
So in a situation where the filesystem containing /etc is having problems and goes read-only, /etc/mtab won't get updated because it's now read-only.
Agreed. So re-using the
mount-info for the r/o-check is at least highly dependent on the individual environment and versions.
Cheers Thomas
Thanks Jeremy, Thomas and Ben for your input. I have just run your command Thomas on a couple of our servers:
strace -e trace=open mount 2>&1 | egrep 'mtab|proc'
The old xymon server looks at /etc/mtab:
open("/etc/mtab", O_RDONLY|O_LARGEFILE) = 3 none on /proc type proc (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
But the newer server running xymon-client that had the disk that went read-only:
open("/proc/mounts", O_RDONLY) = 3 open("/etc/mtab", O_RDONLY) = 3 proc on /proc type proc (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
Both servers are runing a 2.6.x kernel, but the new server has a kernel released last month. So, it does indeed appear to be highly dependent on the version. However, it looks like it will work on probably just about all of servers except for our xymon server, so that's pretty good.
Kind regards,
SebA
I just had this happen to me, to several servers because of my ancient Sun storage array.
The major one was a file system that has files going in and out all the time, with consistent names. So I added to client-local.cfg:
[machine]
File:ls /path/to/file-*|tail -1
And to analisys.cfg: HOST=machine FILE %^/path/to/file-.* RED mtime<1200
On the other hand you could write a script that runs the mount command, and parses that for 'ro' on a line. You'd also want to exclude the proc, sys, and dev type filesystems. As well as CD-Rom or ISO mounts.
From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of SebA Sent: Monday, March 09, 2015 7:44 AM To: xymon at xymon.com Cc: 'Axel Beckert' Subject: [Xymon] Detecting read-only file system in Linux
Hi,
I have been trying to find out if there is a way of Xymon detecting that a file-system in Linux has gone read-only as a result of a disk error (other than reporting it just the once via monitoring /var/log/messages). Nothing is showing up in my Xymon server, but my xymon-client is a bit old: xymon-client-4.3.7-26.1.el5.tnt
I did a bit of Googling and I came up with these two links that may be relevant: http://sisyphus.ru/en/srpm/Sisyphus/xymon/sources/8 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=764197
It seems that a RPM maintainer may have made some modifications to their version in order to catch disks in a read-only state (in the first link) and that there is mount-ro plugin that is part of the hobbit-plugins package in Debian / Ubuntu. Does anyone have more information on either of these and whether any patches can be integrated upstream or plug-ins added to xymonton? CCing Axel Beckert as he seems to have committed something to the mount-ro plugin recently: https://www.openhub.net/p/hobbit-plugins/commits
Although we have some Debian systems, I was looking for a solution for another Linux distro.
If I was to write something myself to do it, I would check /proc/mounts and the best command I could find was: awk '$4~/(^|,)ro($|,)/' /proc/mounts which outputs: /dev/root / ext3 ro,data=ordered 0 0 with sample line: /dev/root / ext3 ro,data=ordered 0 0 This command also produced a nice summary output that might be good to have on a Xymon status page: cat /proc/mounts|sort|awk '{print $1 "\011" toupper(substr($4,0,2) The following was at the bottom of /var/log/messages, but it does not suggest any very obvious alarm strings to add other than the last line without the 'dm-0', but it would be nicer to have something more generic still as textual messages can change between different versions of the O/S.
kernel: sd 0:0:0:0: Unhandled sense code kernel: sd 0:0:0:0: SCSI error: return code = 0x08100002 kernel: Result: hostbyte=invalid driverbyte=DRIVER_SENSE,SUGGEST_OK kernel: sda: Current: sense key: Hardware Error kernel: Add. Sense: Defect list error kernel: kernel: Buffer I/O error on device dm-0, logical block 1358756 kernel: lost page write due to I/O error on dm-0
Kind regards,
SebA
This communication is the property of CenturyLink and may contain confidential or privileged information. Unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy all copies of the communication and any attachments.
On Monday, March 09, 2015 12:44:03 PM SebA wrote:
Hi,
I have been trying to find out if there is a way of Xymon detecting that a file-system in Linux has gone read-only as a result of a disk error (other than reporting it just the once via monitoring /var/log/messages). Nothing is showing up in my Xymon server, but my xymon-client is a bit old: xymon-client-4.3.7-26.1.el5.tnt
Looking to solve more/less the same problem, we went a different way. When you run fscheck on an unmounted filesystem, you'll see an immediate message like "Filesystem has errors..." and you can access that bit even when the filesystem is mounted.
Here's a code snippet for ext2/3/4:
/sbin/debugfs -R "show_super_stats -h " /dev/sda1 2>&1 | /bin/grep -i "Filesystem state"
I've wrapped this into an "fscheck.sh" script that I run on all my servers to catch errors tracked by the file system, the fscheck script is now hundreds of lines long and traps F/S errors for several different file system types. (EG: ZFS)
-Ben
participants (6)
-
beckert@phys.ethz.ch
-
jlaidman@rebel-it.com.au
-
lists@benjamindsmith.com
-
Paul.Root@CenturyLink.com
-
spah@syntec.co.uk
-
thomas.eckert@it-eckert.de