On Nov 30, 2007 10:53 AM, Hubbard, Greg L <greg.hubbard at eds.com> wrote:
Gary,
This is pretty hard to decipher from "afar".
I think I remember you saying that when you dump the data it is always okay?
Actually, it turns out this is not true. The rrd file does indeed have the bad data. I just didn't notice it before, but now that it appears to be getting worse, it is quite obvious to see the bad data.
Some wild thoughts:
a) could there be two different processes updating the same RRD files?
I don't believe so. The strange thing is, all of the graphs that become corrupted have the exact same large number that is being input into the rrd data files.
b) are all servers using the same version of rrdtool?
No. One is running 1.2.23, the other 1.2.26. Both have the problem.
c) are the hobbitgraph files okay? I have proven to my satisfaction that hobbitgraph definition errors can make the graphs act funny.
They haven't changed since before the graphs were having this problem.
d) if this stuff is on a SAN, can it be moved to local storage?
It is on the SAN on one of the machines, and locally on the other. I was thinking of temporarily moving the data directory and have Hobbit regenerate all the data from scratch. I'm trying to avoid this, since that would mean losing a year's worth of trend data that has proven itself very useful. Still, if it helps me narrow down the problem, I'll consider this (and move the data back once I get my answer).
I am just "fishing." Sometimes, when I am at my wit's end, I just change SOMETHING to see if it makes a difference. Even WORSE can help get me started.
GLH
*From:* Gary Baluha [mailto:gumby3203 at gmail.com] *Sent:* Friday, November 30, 2007 9:25 AM *To:* hobbit at hswn.dk *Subject:* Re: [hobbit] strange graph behavior - random machines & graphs
Now this appears it is becoming a more serious problem. It seems more and more graphs are starting to be affected, and I still have no explanation for what is going on here. It also seems that almost any new graph that is created (such as if I delete/rename/move an existing .rrd file), it immediately starts off being corrupted. :-(
On Nov 28, 2007 10:08 AM, Gary Baluha <gumby3203 at gmail.com> wrote:
I have recently noticed a strange thing happening with some of the rrd graphs generated by Hobbit. When you look at the graph, it looks as though the rrd data is one one format (gauge), but the graph is generating it in a different format (derive). I can't seem to find any pattern to the hosts or tests that are exhibiting this strange behavior, and it is only happening on a handful of graphs. I have attached a picture of one of these graphs, since I'm not really sure how to describe it. Note the huge numbers displayed on the curr/min/avg/max line.
Any idea what's going on here? When I dump the RRD file manually, everything looks okay. I'm running Hobbit 4.2.0 with the 2007-02-09 allinone patch (I believe the latest). This has only happened in the past few weeks, though when exactly it started, I don't know. Any ideas?