On 27-08-2012 17:19, Henrik Størner wrote:
the xymon.com server had a minor disk "hiccup" last Saturday. Unfortunately, this triggered a kernel panic and things went pretty bad after that - eventually causing the whole server to die last Monday Aug. 20th.
Turns out it was more than just a hiccup - I was bitten by a firmware bug in my Crucial M4 SSD disk http://forum.crucial.com/t5/Solid-State-Drives-SSD/Firmware-Update-Notificat...
"an incorrect response to a SMART counter will cause the m4 drive to become unresponsive after 5184 hours of Power-on time. The drive will recover after a power cycle, however, this failure will repeat once per hour after reaching this point."
If any of you have Crucial M4 SSD disks in use, I'd recommend checking the firmware version ASAP - it must be version 0309 or 000F. "smartctl -a" on Linux can tell you.
Regards, Henrik