Re: Axel Beckert 2016-06-15 <20160615155816.GD29167 at phys.ethz.ch>
in the past few months I found more and more indices for a strange bug in (at least) Xymon 4.3.27 which occasionally mixes up hosts when handling reports:
- Machines with a single disk (e.g. VMs) occassional report status of a "raid" test which is not deployed to them -- and then (for obvious reasons) went purple on it. On that server, there's only one machine in having a RAID, but its "raid" reports have been misassigned to at least three other hosts, all host which have rather many tests (compared to a bunch of sensors which send in only very few tests per host). [...]
Fwiw, I've seen instances of such behavior ever since I've started taking care of a hobbit installation at a customer site in late 2007. Symptoms are randomly mixed up hosts. I can say if there are tests that are hit more than others, the problem is mostly visible through disk tests by finding rrd files on disk for partitions that do not exist on this host.
It doesn't seem to happen constantly, but rather in bursts, but I don't have hard data on that. My impression was that it only happens during busy periods, but that could be totally wrong.
We've been on 4.3.0 for a long time until finally upgrading about two years ago, and I thought the problem was gone then, but what Axel is describing is exactly what we were (are?) seeing there.
Christoph