29 Jan
2016
29 Jan
'16
7:56 a.m.
Hi,
I'm running Xymon since 6 years (4.3.17 atm) on Debian 7.8 3.2.0-4-amd64 Since 1 month now, every night, between 0h30 or 2h am at +/- 30 min, around 30 hosts become unreachable :
Fri Jan 29 01:16:38 2016 conn NOT ok : DNS lookup failed Unable to resolve hostname foo.bar.local System unreachable for 3 poll periods (170 seconds) green 0.0.0.0 is alive (0.02 ms) [<- 127.0.0.1]
- Got around 500 monitored hosts and looks like the same hosts are lost every single night.
- Those monitored hosts are not necessary on the same network, not the same OS.
- We cross monitored the same hosts and the other monitoring tool doesn't have report the DNS outage.
- I ran a DNS lookup every seconds on the Hobbit server several days and it never reported a DNS outage.
- I don't have any crontab installed on the server who could disturb Xymon.
- Nothing strange in the Xymon logs nor the server logs, no memory leaks or CPU overloaded.
- The rest of the day, Xymon server behavior is normal.
- What I've done on the server 1 month ago ? I don't know, no system upgrade or so.
- I had DNSMASQ acting like a cache, I disabled it : same issue
- /etc/resolv.conf is quite light : search bar.local, next line : nameserver IP.OF.OUR.DNS.SERVER1, just like other servers
The issue could be anywhere : inside or outside the server, Xymon or not... I have to confess, I'm running out of ideas to find the issue, is anyone here may have some leads, I will be thankful !
Have a nice day!