Hello all,
I've got a large testing install of Xymon going on with a couple hundred hosts reporting back client status via a bbmessage.cgi->bbproxy->hobbitd proxy over port 80.
Things were working great with a stable 4.2.0 running on the hobbit server and Xymon-4.3.0.b2 on the proxy server. However when I upgraded the Hobbitd server to 4.3.0.b2, I lost reliable RRD graphing. Client reports are still coming in and being updated properly from a status perspective, but it seems the RRD graphers are getting only intermittently usable data. See attached image for an example.
I've got a few manual custom graphs being generated with the NCV directive, and that stuff is being graphed just fine - it's only the built-in client reporting that suddenly stopped.
I tried clearing out all the RRD files after the upgrade, thinking some sort of compatibility issue had occurred, but I'm still getting the same problem. USR2 logging on the hobbit_rrd processes shows nothing obvious, other than the ubiquitous "illegal attempt to update using time ... when last update time is ... (minimum one second step)" errors. There seem to be lots of client reports being logged by RRD, but I'm not certain I'm reading it properly. Rrdtool dump is showing lots of NaN entries in the RRD files. Thinking there was a change in memory usage that was truncating client reports combo-ized by bbproxy, I increased all the MAXMSG_* variables up to 2048, but that didn't seem to have any effect.
If anyone has run into this before or has a suggestion on where to start, I'd appreciate it.
Regards,
Japheth Cleaver