On Thu, Feb 17, 2005 at 01:21:42PM -0500, Tom Georgoulias wrote:
Coincidence or not, it seems that after I applied the fix above and rebuilt hobbit, sometime later a hobbitd_larrd column appeared and stayed red then purple for a very long time. The error message was "fatal signal caught" or something like that.
Aha - all of the hobbitd programs have a built-in feature so that if they do crash, they'll try to let you know it happened by sending off a status-message about themselves, like the one you saw. Since hobbitd_larrd doesn't normally send status messages, it will eventually go purple.
I'll look over the code - there's probably something that needs more thorough error-checking to withstand all kinds of input.
PS: If you want me to look at that Netapp disk-report that isn't being graphed, just send me an example of what such a report looks like.
Sure thing. See below, sorry about the line wrap. After seeing what you looked at in the CPU case, I think I know what the problem could be. The rest of my systems use the phrase "Disk partitions" while the filer uses "NetAPP Volumes". I poked at the do_disk.c code but was clearly out of my league when it came to fixing it.
A bit of experience with the code does help :-) The disk handler is one of the more complicated ones.
The column ordering is different too, although I can reorder it in the perl script to match the other linux style systems if needed.
That won't be necessary.
I think I have something now that appears to work. I'll send you the latest source-files directly to test, and then it will be in the next release.
Regards, Henrik