I have a couple of name servers which are not pingable. I have them in my hosts.cfg as follows:
10.10.10.10 ns1-int.foo.com # noconn dns=ns1.foo.com 0.0.0.0 ns2-int.foo.com # noconn dns=ns2.foo.com
The first returns a green dns test result. The second returns a red "Service dns on ns2-int.foo.com is not OK : Service unavailable." I can't figure out why. I don't want to embed ip addresses in my hosts.cfg, and I can't think why I should need to.
There is nothing telling when I run the test interactively with: xymoncmd xymonnet --no-update --debug ns1-int.foo.com nor for ns2-int.foo.com
When I snoop the interface while doing the above tests, the differences are significant:
For ns1-int, I see xymon lookup ns1-int on the normal name server. Then I also see it request ns1.foo.com from ns1-int.foo.com and get the correct response. I'm not sure why the reverse lookup is there.
xymona -> d.foo.com DNS C ns1-int.foo.com. Internet Addr ?d.foo.com -> xymona DNS R ns1-int.foo.com. Internet Addr 10.10.10.10 xymona -> ns1-int.foo.com DNS C ns1.foo.com. Internet Addr ? ns1-int.foo.com -> xymona DNS R ns1.foo.com. Internet Addr 192.168.123.1 xymona -> d.foo.com DNS C 10.10.10.10.in-addr.arpa. Internet PTR ? d.foo.com -> xymona DNS R 10.10.10.10.in-addr.arpa. Internet PTR ns1-int.foo.com.
For ns2-int, however, the result is much simpler. I see xymon lookup ns2-int on the normal name server. . . . and then nothing.
xymona -> d.foo.com DNS C ns2-int.foo.com. Internet Addr ?d.foo.com -> xymona DNS R ns2-int.foo.com. Internet Addr 10.20.20.20
When I change the line in hosts.cfg for ns2-int from 0.0.0.0 to an ip address, it works fine.
What is going on here? Why isn't the DNS test using the address it has obviously retrieved from DNS?
Do things because you should, not just because you can.
John Thurston 907-465-8591 John.Thurston at alaska.gov Enterprise Technology Services Department of Administration State of Alaska
On Tue, April 5, 2016 4:22 pm, John Thurston wrote:
I have a couple of name servers which are not pingable. I have them in my hosts.cfg as follows:
10.10.10.10 ns1-int.foo.com # noconn dns=ns1.foo.com 0.0.0.0 ns2-int.foo.com # noconn dns=ns2.foo.com
The first returns a green dns test result. The second returns a red "Service dns on ns2-int.foo.com is not OK : Service unavailable." I can't figure out why. I don't want to embed ip addresses in my hosts.cfg, and I can't think why I should need to.
There is nothing telling when I run the test interactively with: xymoncmd xymonnet --no-update --debug ns1-int.foo.com nor for ns2-int.foo.com
When I snoop the interface while doing the above tests, the differences are significant:
For ns1-int, I see xymon lookup ns1-int on the normal name server. Then I also see it request ns1.foo.com from ns1-int.foo.com and get the correct response. I'm not sure why the reverse lookup is there.
xymona -> d.foo.com DNS C ns1-int.foo.com. Internet Addr ?d.foo.com -> xymona DNS R ns1-int.foo.com. Internet Addr 10.10.10.10 xymona -> ns1-int.foo.com DNS C ns1.foo.com. Internet Addr ? ns1-int.foo.com -> xymona DNS R ns1.foo.com. Internet Addr 192.168.123.1 xymona -> d.foo.com DNS C 10.10.10.10.in-addr.arpa. Internet PTR ? d.foo.com -> xymona DNS R 10.10.10.10.in-addr.arpa. Internet PTR ns1-int.foo.com.
For ns2-int, however, the result is much simpler. I see xymon lookup ns2-int on the normal name server. . . . and then nothing.
xymona -> d.foo.com DNS C ns2-int.foo.com. Internet Addr ?d.foo.com -> xymona DNS R ns2-int.foo.com. Internet Addr 10.20.20.20
When I change the line in hosts.cfg for ns2-int from 0.0.0.0 to an ip address, it works fine.
What is going on here? Why isn't the DNS test using the address it has obviously retrieved from DNS?
It might be because DNS is tested distinct from the resolution involved in initial xymonnet setup. In particular, pinging is done in parallel with the TCP work, and IIRC there's a point where a lookup happens regardless (although it's suppressed with testip indicated).
There's a latent bug in taking 0.0.0.0 as a not-invalid IP address which might be part of this too... Even if it's not used, an IP other than that (but without testip) might give you the behavior you'd wanted.
HTH, -jc
On 4/5/2016 5:50 PM, J.C. Cleaver wrote:
- snip -
It might be because DNS is tested distinct from the resolution involved in initial xymonnet setup.
I think it is more than that. Please consider this test condition:
10.10.10.10 ns1-int.foo.com # noconn dns=ns1.foo.com 0.0.0.0 ns2-int.foo.com # noconn dns=ns2.foo.com 0.0.0.0 ns3-int.foo.com # noconn dns
In this case, snooping the network while explicitly running xymonnet shows very different results for ns2-int and ns3-int
When testing ns2-int, there is a query to the normal resolver asking for an address for ns2-int.foo.com and there is an answer. That is all.
When testing ns3-int, the query to the normal resolver for ns3-int.foo.com is performed (and answered). This if followed by a query to ns3-int.foo.com for ns3-int.foo.com.
The presence of the equal-sign separated parameter to the DNS test is breaking the test function. In my tests, the equal-sign syntax is only functional if the host to be tested is specified with an ip address. This is true even if xymonnet is called with --dns=only
I suspect xymonnet is finding the equal-sign, but not parsing the parameters and adding them to the query list. The dns-test code, is then being asked to perform a lookup for nothing, and is doing exactly that. The non-results are then evaluated, no success-markers are found, so the test is reported as RED.
These tests have been performed on Solaris 10 with 4.3.26. I'm going to try similar tests on Linux on 4.3.17 and see how those results compare.
-- Do things because you should, not just because you can.
John Thurston 907-465-8591 John.Thurston at alaska.gov Enterprise Technology Services Department of Administration State of Alaska
On 4/6/2016 9:23 AM, John Thurston wrote:
- snip -
The presence of the equal-sign separated parameter to the DNS test is breaking the test function. In my tests, the equal-sign syntax is only functional if the host to be tested is specified with an ip address. This is true even if xymonnet is called with --dns=only
- snip-
These tests have been performed on Solaris 10 with 4.3.26. I'm going to try similar tests on Linux on 4.3.17 and see how those results compare.
I have confirmed that I see the same broken results on on linux 4.3.17
I hadn't experienced this defect before because my only "equal-sign DNS" tests were being performed against un-resolvable hosts (so were specified with an IP address); and all of my resolvable hosts were only being asked to look up their own name.
I can hack around this defect by defining my host with an IP address, but it would be nice if we figure out why its broken and fix it. I'm off to wade through xymonnet.c with my terrible C-parsing skills.
-- Do things because you should, not just because you can.
John Thurston 907-465-8591 John.Thurston at alaska.gov Enterprise Technology Services Department of Administration State of Alaska
On Wed, April 6, 2016 10:41 am, John Thurston wrote:
On 4/6/2016 9:23 AM, John Thurston wrote:
- snip -
The presence of the equal-sign separated parameter to the DNS test is breaking the test function. In my tests, the equal-sign syntax is only functional if the host to be tested is specified with an ip address. This is true even if xymonnet is called with --dns=only
- snip-
These tests have been performed on Solaris 10 with 4.3.26. I'm going to try similar tests on Linux on 4.3.17 and see how those results compare.
I have confirmed that I see the same broken results on on linux 4.3.17
I hadn't experienced this defect before because my only "equal-sign DNS" tests were being performed against un-resolvable hosts (so were specified with an IP address); and all of my resolvable hosts were only being asked to look up their own name.
I can hack around this defect by defining my host with an IP address, but it would be nice if we figure out why its broken and fix it. I'm off to wade through xymonnet.c with my terrible C-parsing skills.
Thanks for the investigation (and confirmation). I see the same thing here also. I'll take a closer look at the code base here too... I don't imagine it'll be too difficult a fix.
Regards, -jc
participants (2)
-
cleaver@terabithia.org
-
john.thurston@alaska.gov