Only back at that client again on Wednesday. But I do have remote access, so will see what I can do this evening.
On 25 February 2015 at 18:06, Jeremy Laidman <jlaidman at rebel-it.com.au> wrote:
On 25 February 2015 at 19:16, Vernon Everett <everett.vernon at gmail.com> wrote:
These hosts all have nothing at all to do with the storage arrays being monitored, which makes me think the client data might be a red herring.
Yup, makes sense.
My best guess is memory corruption within xymond. So let's see if the corruption is visible in the messages being passed between xymond and xymond_channel. If we see corrupt messages in there, we can start to delve into the source code to see if there's a bug somewhere. Are you able to run your own instance of xymond_channel? Maybe something like this:
sudo -u xymon xymoncmd xymond_channel --channel=data --filter=zmem cat
One you get an idea what it looks like, change "cat" for something like "egrep -A5 ^@" to get only the first 5 lines. Also, redirect to a file until you notice a dodgy RRD file and then kill the process.
Did you try running xymond with "--dbghost=HOSTNAME" ? It might be too voluminous, but might be worth a try, if you can manage to snag the traffic at the right time.
J
-- "Accept the challenges so that you can feel the exhilaration of victory"
- General George Patton