Unfortunately, no, I can't do this as our Hobbit server monitors production machines. The data directory for the rrd files are SAN-mounted, and we haven't had disk corruption issues before with this type of setup.
The strange thing is, this only started within the past week, and unfortunately it seems to be spreading to more and more RRD graphs.
On Nov 29, 2007 3:41 PM, Josh Luthman <josh at imaginenetworksllc.com> wrote:
Can you do a dd if=/dev/sda of=/dev/null from the disk in which the stuff is stored? If it is so random I'm curious to see if the fs is having problems. I have my money on a bug in the software or bad disk/fs/controller.
On 11/29/07, Gary Baluha <gumby3203 at gmail.com> wrote:
I don't know how many hosts are affected, percentage wise, but it's definitely not every host. And for the hosts having the problem, it's not even the same graphs that are having the problem.
On Nov 29, 2007 3:11 PM, Josh Luthman <josh at imaginenetworksllc.com> wrote:
Same OS at home?
Not sure if you mentioned this or not but does that weird value show up in all RRD graphs or just a few hosts?
On 11/29/07, Josh Luthman <josh at imaginenetworksllc.com> wrote:
Do they monitor the same devices? I think there has to be some similarity between the two as they had the same problem at the same time (though this isn't 100%, it's logically the first place to look). Hardware isn't of much concern here as they don't communicate and the chances of both servers going bad on the same date is simply astronomical.
Are there any kind of auto updating services running on them?
On 11/29/07, Gary Baluha < gumby3203 at gmail.com > wrote:
We only have two Hobbit servers, and it is affecting both machines. No, these two Hobbit machines do _not_ communicate with each other in any way.
On Nov 29, 2007 2:33 PM, Josh Luthman < josh at imaginenetworksllc.com> wrote:
Is this problem not showing up on another Hobbit server? Do the two Hobbit servers with this problem communicate at all (share data/SNMP traffic/etc)?
On 11/29/07, Gary Baluha <gumby3203 at gmail.com> wrote:
> On Nov 29, 2007 12:01 PM, Josh Luthman < > josh at imaginenetworksllc.com> wrote: > > > This is completely beyond my knowledge, but the first place > > I would look at is any hardware problems, any recent changes (obviously =) > > and the similarities between those two Hobbit servers issues. > > > > That's the thing, there aren't any similarities between these > two machines. They are different hardware, different OS, different network > segment, and different hosts being monitored. > > There were some recent changes in the past month to one of the > hobbit servers, with a bunch of custom RRD graphs added. But this wasn't > done on the other hobbit server. The only thing changed on the other hobbit > server is more html web checks added; nothing out of the ordinary. > >
-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
-- Josh Luthman Office: 937-552-2340 Direct: 937-552-2343 1100 Wayne St Suite 1337 Troy, OH 45373
Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer