Hello,
Using Xymon 4.3.7 I have noticed that if I reboot the Xymon server then the 'conn' test fails for all the clients. E.g.:
============================ Thu Jul 12 10:24:11 2012 conn NOT ok Service conn on dns1 is not OK : Host does not respond to ping
System unreachable for 5 poll periods (984 seconds)
If, from the server, I run 'ping' to the client then that works fine. So does fping. If I stop then start the Xymon service on the server then the client conn tests all report ok.
Any ideas about this?
John.
-- John Horne Tel: +44 (0)1752 587287 Plymouth University, UK Fax: +44 (0)1752 587001
How long did you wait between the reboot and restarting Xymon?
On Thu, Jul 12, 2012 at 7:35 PM, John Horne <john.horne at plymouth.ac.uk>wrote:
Hello,
Using Xymon 4.3.7 I have noticed that if I reboot the Xymon server then the 'conn' test fails for all the clients. E.g.:
============================ Thu Jul 12 10:24:11 2012 conn NOT ok Service conn on dns1 is not OK : Host does not respond to ping
System unreachable for 5 poll periods (984 seconds)
If, from the server, I run 'ping' to the client then that works fine. So does fping. If I stop then start the Xymon service on the server then the client conn tests all report ok.
Any ideas about this?
John.
-- John Horne Tel: +44 (0)1752 587287 Plymouth University, UK Fax: +44 (0)1752 587001
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
On Fri, 2012-07-13 at 14:45 +1000, Jeremy Laidman wrote:
How long did you wait between the reboot and restarting Xymon?
On Thu, Jul 12, 2012 at 7:35 PM, John Horne <john.horne at plymouth.ac.uk> wrote:
Using Xymon 4.3.7 I have noticed that if I reboot the Xymon server then the 'conn' test fails for all the clients. E.g.: ============================ Thu Jul 12 10:24:11 2012 conn NOT ok Service conn on dns1 is not OK : Host does not respond to ping System unreachable for 5 poll periods (984 seconds) ============================ If, from the server, I run 'ping' to the client then that works fine. So does fping. If I stop then start the Xymon service on the server then the client conn tests all report ok.
Hello,
I have waited various amounts of time, from as soon as I could log in (about a minute or two since rebooting), up to about an hour.
I should have added that after a reboot, and when the conn tests are red, then they stay red! Yet the clients are all up and running, and are pingable. At what time I restart Xymon seems to make no difference, once it is done then the tests start to turn green.
I can only assume that there is some initial condition which causes the ping to fail, but that it remains in force until Xymon is restarted. Very odd. I will investigate, but am a little lost as to why, say after 5, 10, 60 (!) mins, the tests do not automatically turn green.
I added 'trace' to one client in hosts,cfg, and it shows the traceroute working fine but the test is still red and saying the ping failed.
John.
-- John Horne Tel: +44 (0)1752 587287 Plymouth University, UK Fax: +44 (0)1752 587001
What's the ping command set to in your server configuration file? are you using the 'xymonping' command or 'fping'? Make sure that which ever command you are using has the sticky bit set on the actual executable to allow the xymon user to run it.
Steve
On 13 July 2012 09:38, John Horne <john.horne at plymouth.ac.uk> wrote:
On Fri, 2012-07-13 at 14:45 +1000, Jeremy Laidman wrote:
How long did you wait between the reboot and restarting Xymon?
On Thu, Jul 12, 2012 at 7:35 PM, John Horne <john.horne at plymouth.ac.uk> wrote:
Using Xymon 4.3.7 I have noticed that if I reboot the Xymon server then the 'conn' test fails for all the clients. E.g.: ============================ Thu Jul 12 10:24:11 2012 conn NOT ok Service conn on dns1 is not OK : Host does not respond to ping System unreachable for 5 poll periods (984 seconds) ============================ If, from the server, I run 'ping' to the client then that works fine. So does fping. If I stop then start the Xymon service on the server then the client conn tests all report ok.Hello,
I have waited various amounts of time, from as soon as I could log in (about a minute or two since rebooting), up to about an hour.
I should have added that after a reboot, and when the conn tests are red, then they stay red! Yet the clients are all up and running, and are pingable. At what time I restart Xymon seems to make no difference, once it is done then the tests start to turn green.
I can only assume that there is some initial condition which causes the ping to fail, but that it remains in force until Xymon is restarted. Very odd. I will investigate, but am a little lost as to why, say after 5, 10, 60 (!) mins, the tests do not automatically turn green.
I added 'trace' to one client in hosts,cfg, and it shows the traceroute working fine but the test is still red and saying the ping failed.
John.
-- John Horne Tel: +44 (0)1752 587287 Plymouth University, UK Fax: +44 (0)1752 587001
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
On Fri, 2012-07-13 at 10:02 +0100, Steven Carr wrote:
What's the ping command set to in your server configuration file? are you using the 'xymonping' command or 'fping'? Make sure that which ever command you are using has the sticky bit set on the actual executable to allow the xymon user to run it.
It is set to use fping. The pathname is correct, and the sticky bit is set. I have run fping from the Xymon server as the xymon user and it works fine:
=============================== xymon 17: fping -Ae 141.163.1.250 141.163.177.1 141.163.1.250 is alive (0.43 ms) 141.163.177.1 is alive (0.35 ms)
John.
-- John Horne Tel: +44 (0)1752 587287 Plymouth University, UK Fax: +44 (0)1752 587001
On Fri, July 13, 2012 04:38, John Horne wrote:
On Fri, 2012-07-13 at 14:45 +1000, Jeremy Laidman wrote:
How long did you wait between the reboot and restarting Xymon?
On Thu, Jul 12, 2012 at 7:35 PM, John Horne <john.horne at plymouth.ac.uk> wrote:
Using Xymon 4.3.7 I have noticed that if I reboot the Xymon server then the 'conn' test fails for all the clients. E.g.: ============================ Thu Jul 12 10:24:11 2012 conn NOT ok Service conn on dns1 is not OK : Host does not respond to ping System unreachable for 5 poll periods (984 seconds) ============================ If, from the server, I run 'ping' to the client then that works fine. So does fping. If I stop then start the Xymon service on the server then the client conn tests all report ok.Hello,
I have waited various amounts of time, from as soon as I could log in (about a minute or two since rebooting), up to about an hour.
I should have added that after a reboot, and when the conn tests are red, then they stay red! Yet the clients are all up and running, and are pingable. At what time I restart Xymon seems to make no difference, once it is done then the tests start to turn green.
I can only assume that there is some initial condition which causes the ping to fail, but that it remains in force until Xymon is restarted. Very odd. I will investigate, but am a little lost as to why, say after 5, 10, 60 (!) mins, the tests do not automatically turn green.
I added 'trace' to one client in hosts,cfg, and it shows the traceroute working fine but the test is still red and saying the ping failed.
Just a WAG: could Xymon be getting started before the network interfaces and be locked onto localhost as a route, or in some other ambiguous networking state? How's it getting started at boot?
On Fri, Jul 13, 2012 at 6:38 PM, John Horne <john.horne at plymouth.ac.uk>wrote:
I should have added that after a reboot, and when the conn tests are red, then they stay red! Yet the clients are all up and running, and are pingable. At what time I restart Xymon seems to make no difference, once it is done then the tests start to turn green.
This symptom is probably significant, but I can't think what might cause it. Once we know, it will all make sense!
Does tcpdump/snoop show the ping packets before the restart of Xymon?
J
On Thu, 2012-07-12 at 10:35 +0100, John Horne wrote:
Hello,
Using Xymon 4.3.7 I have noticed that if I reboot the Xymon server then the 'conn' test fails for all the clients. E.g.:
============================ Thu Jul 12 10:24:11 2012 conn NOT ok Service conn on dns1 is not OK : Host does not respond to ping
System unreachable for 5 poll periods (984 seconds)
If, from the server, I run 'ping' to the client then that works fine. So does fping. If I stop then start the Xymon service on the server then the client conn tests all report ok.
Hello,
Sorry, but this turned out to be an SELinux problem. 'fping' is denied write access to files in the ~/server/tmp directory on the Xymon server. However, fping records its results in that directory, and Xymon looks at them to see if a client is alive or not. Since there were no results, because of SELinux, Xymon figured that all the clients were down.
I have created a local SELinux policy to allow writes for fping and that seems to work. (I have rebooted the Xymon server and it didn't show any red ping/conn tests.)
The clients don't use 'fping' so they don't have this problem.
Why did restarting the Xymon service (not the server) allow the tests to turn green? Not sure.
Thanks for all the replies.
John.
-- John Horne Tel: +44 (0)1752 587287 Plymouth University, UK Fax: +44 (0)1752 587001
On Thu, 2012-07-12 at 10:35 +0100, John Horne wrote: Hello,
Sorry, but this turned out to be an SELinux problem. 'fping' is denied write access to files in the ~/server/tmp directory on the Xymon server. However, fping records its results in that directory, and Xymon looks at them to see if a client is alive or not. Since there were no results, because of SELinux, Xymon figured that all the clients were down.
I have created a local SELinux policy to allow writes for fping and that seems to work. (I have rebooted the Xymon server and it didn't show any red ping/conn tests.)
The clients don't use 'fping' so they don't have this problem.
Why did restarting the Xymon service (not the server) allow the tests to turn green? Not sure.
SELinux policies distinguish between appending, writing, and seeking in many cases. I don't recall the details, but I remember needing to futz with different policies to figure out what was going on as well. Was anything interesting going on in the audit logs at the time?
-jc
On Tue, 2012-07-17 at 03:51 -0700, cleaver at terabithia.org wrote:
On Thu, 2012-07-12 at 10:35 +0100, John Horne wrote: Hello,
Sorry, but this turned out to be an SELinux problem. 'fping' is denied write access to files in the ~/server/tmp directory on the Xymon server. However, fping records its results in that directory, and Xymon looks at them to see if a client is alive or not. Since there were no results, because of SELinux, Xymon figured that all the clients were down.
I have created a local SELinux policy to allow writes for fping and that seems to work. (I have rebooted the Xymon server and it didn't show any red ping/conn tests.)
The clients don't use 'fping' so they don't have this problem.
Why did restarting the Xymon service (not the server) allow the tests to turn green? Not sure.
SELinux policies distinguish between appending, writing, and seeking in many cases. I don't recall the details, but I remember needing to futz with different policies to figure out what was going on as well. Was anything interesting going on in the audit logs at the time?
Hi,
Nothing else was going on in the logs at the time that the fpings were stopped. The log showed that it was a write denial:
============================= type=AVC msg=audit(1342195229.681:349): avc: denied { write } for pid=25973 comm="fping" path="/home/xymon/server/tmp/ping-stderr.25955.00" dev=sdb1 ino=1587865 scontext=system_u:system_r:ping_t:s0 tcontext=system_u:object_r:user_home_t:s0 tclass=file
Using audit2allow to create a policy allowing writes in 'tmp' solved the problem.
John.
-- John Horne Tel: +44 (0)1752 587287 Plymouth University, UK Fax: +44 (0)1752 587001
participants (5)
-
cleaver@terabithia.org
-
hobbit@epperson.homelinux.net
-
jlaidman@rebel-it.com.au
-
john.horne@plymouth.ac.uk
-
sjcarr@gmail.com