On 15-08-2011 22:46, Poppy, Ben wrote:
I'm having a pretty strange issue. We have our existing hobbit servers running on Fedora servers running hobbit 4.2.0. I'm working on installing brand new servers that will be running CentOS 6 64-bit and the latest version of xymon (4.3.3 before I saw 4.3.4 today).
[installs and starts 4.3 version]
Within a few minutes, 4 servers turn to red alerts on CONN on the existing Fedora based Hobbit servers. They begin flapping on and off of red alert until I shutdown the new CentOS xymon server. Within a few minutes of the new server being shut down, the alerts go away for good.
I have tried going to Centos 5 32-bit, 64-bit, even trying xymon 4.2.3, or all the way back to hobbit 4.2.0 all with the same result, and the exact same 4 servers each time.
As I understand, you were running both versions simultaneously. Did those servers also go red on the new Xymon version, or only on the old one? If they were red also on the new server, did you try stopping network tests on the old server and did that make a difference ?
Which ping-tool are you using - xymonping or fping ?
I haven't heard of anything like this before, but I suspect it may be an issue with the way "ping" works. When routing traffic, most systems will pass ping-traffic with a low priority, so it is quite easy for ping-requests and -responses to be dropped. Since xymonping and fping pump out a lot of ping-traffic rather quickly, maybe the new server just happened to be more "lucky" with its data than the old one - perhaps due to the switch port it is on, or the speed of the network interface and so on.
It might be worthwhile to make sure that the old and the new system does not run the network tests at the same time - keep an eye (with "ps" on when the network test runs on the old system, and don't start Xymon on the new system until about 30 secs after the old system completes the network tests. (Assuming your network tests don't take more than a couple of minutes, so there is time for both systems to run their tests within the default 5 minute interval).
Regards, Henrik