Odd named errors, perhaps related to Xymon or Devmon?
We've been getting an average of 40+/day of these messages from the caching nameserver process on one of our Xymon/Devmon servers:
named[N]: sockmgr 0xN: maximum number of FD events (64) received
This server is running bind-9.3.6-16.P1 (old, I know, but it's the latest for the distro we are using). I was wondering if anyone else out there was seeing the same errors? I'm making an assumption that Xymon/Devmon (or one of our ext scripts) might be inducing this condition, since this server doesn't have any other processes on it that would be doing a large amount of DNS lookups. This server has over 600 entries in hosts.cfg and 92 entries in the Devmon hosts.db. I have tried reproducing the problem by making a large amount of DNS lookups as simultaneously as possible, but with no luck. The xymonnet status only lists 42 failed DNS lookups out of 797 calls to dnsresolve. Any suggestions on how to find out what specific process is causing named to make periodic high demands for sockets?
-- S i m e o n B e r k l e y
Systems Engineer McClatchy Interactive phone: 919-861-1244 fax: 919-861-1300 mobile: 919-302-3063 e-mail: sberkley at mcclatchyinteractive.com AIM: sberkleymi www.mcclatchyinteractive.com
On Tue, Sep 13, 2011 at 1:49 AM, Berkley, Simeon < sberkley at mcclatchyinteractive.com> wrote:
We've been getting an average of 40+/day of these messages from the caching nameserver process on one of our Xymon/Devmon servers:
named[N]: sockmgr 0xN: maximum number of FD events (64) received
This is a known condition (seems to be more common on Solaris) that you shouldn't need to worry about. You can get rid of the message by recompiling BIND, if you're so inclined.
It's related to the way BIND asks the OS for any sockets with available data. BIND uses epoll_wait() to fill a buffer of sockets with data, but it gives epoll_wait() a maximum number of sockets to return, so as to avoid buffer overflow. If the number of sockets with data is higher than the complied maximum (64 by default), BIND logs the message, handles what's returned from epoll_wait() and then loops around again. No lost data is indicated by this.
I have tried reproducing the problem by making a large amount of DNS lookups as simultaneously as possible, but with no luck.
Have you tried using "queryperf" from the BIND source?
The xymonnet status only lists 42 failed DNS lookups out of 797 calls to dnsresolve. Any suggestions on how to find out what specific process is causing named to make periodic high demands for sockets?
Enable query logging and find out what queried domains are highest during the interval. This might give you a hint.
If you're only using BIND as a local caching nameserver, you could consider instead using nscd, the name service caching daemon. For most applications, this is a more lightweight caching resolver. It's not suitable for everyone, but it might be just fine for your purpose, and you would no longer see the error messages.
Cheers Jeremy
participants (2)
-
jlaidman@rebel-it.com.au
-
sberkley@mcclatchyinteractive.com