Henrik> I assume the both monitor3 and monitor5 are running the network tests Henrik> (the [xymonnet] task), and they are the same version?
Yes to both questions:
[xymon at monitor5 ~]$ server/bin/xymoncmd xymonnet --version 2011-08-09 10:28:49 Using default environment file /home/xymon/server/etc/xymonserver.cfg xymonnet version 4.3.2 SSL library : OpenSSL 0.9.8e-rhel5 01 Jul 2008 LDAP library: OpenLDAP 20343
[xymon at monitor5 ~]$
[xymon at monitor3 ~]$ server/bin/xymoncmd xymonnet --version 2011-08-09 10:27:51 Using default environment file /home/xymon/server/etc/xymonserver.cfg xymonnet version 4.3.2 SSL library : OpenSSL 0.9.8e-rhel5 01 Jul 2008 LDAP library: OpenLDAP 20343
[xymon at monitor3 ~]$
Henrik> Could you provide the output from running Henrik> Henrik> xymoncmd xymonnet --no-update --debug camilla Henrik> Henrik> on the two Xymon servers when it fails ?
Yes, I set up a test host (tesla) to simulate the problem (delay + bad HTTP status) that I had with the production host (camilla) and ran:
[xymon at monitor3 ~]$ server/bin/xymoncmd xymonnet --no-update --debug tesla
...on both monitoring servers and attached the results and the '/home/xymon/data/hist/tesla' contents.
The hosts.cfg entries are:
192.168.0.2 tesla # ssh http://tesla httpstatus=dbB;http://tesla/cgi/dbB.pl;200; httpstatus=dbC;http://tesla/cgi/dbC.pl;200;
The only difference that I know of between the monitoring servers is that monitor5 has an 'analysis.cfg' with:
HOST=tesla DS http %.*http.*:sec >0.660 COLOR=yellow TEXT="Time exceeds &U at &V seconds."
With the above monitor5:analysis.cfg directive; the status is reported as yellow; without it is reported as red.
It looks like the analysis.cfg entry causes Xymon to mask the red with the yellow.
Any help understanding this would be appreciated.
- Troy