Hi,
[TL;DR: See Summary at the end.]
I'm slowly running out of ideas with the following issue which has been noticed after I rolled out 4.3.22-rc2 on our two monitoring servers (still running the servers on 4.3.22-rc2 at the moment):
The graph on https://xymon.phys.ethz.ch/xymon-cgi/svcstatus.sh?HOST=zwoelfi&SERVICE=procs is no more there, because https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=proce... returns only an 1x1 pixel PMG. The same happens on the second (independent, not slave) server, too.
(No version changes on the affected clients. Those I checked have either 4.3.0-beta2 from Debian 7 or 4.3.17 from Debian 8.)
I've found the following messages upon reloading the above URL in Apache's error.log:
2015-11-11 12:32:38.839801 Sendto failed: Connection refused 2015-11-11 12:32:38.839853 Sendto failed: Connection refused 2015-11-11 12:32:38.839871 Sendto failed: Connection refused
I've found http://lists.xymon.com/archive/2015-February/041189.html with these messages, stopped the xymon service, removed all left over rrdctl.* files from /var/lib/xymon/tmp/ and started the xymon service again.
Result is: I still only get an 1x1 pixel PNG, but the error messages are gone, i.e. the issues are likely unrelated as they were in the mailing list posting above.
Then again on https://xymon.phys.ethz.ch/xymon-cgi/svcstatus.sh?HOST=zwoelfi&SERVICE=trend... the "Process counts" graph is there (but seems not working):
https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=proce...
The difference between this and the first URL are (besides the time stamps): The first URL has nostale (without value) and color=green as additional query string parameters, and the second URL has instead first=1 and count=4 as query string parameters.
As soon as I remove the "nostale" without a value or add a value like e.g. "nostale=1", the graph is back again (but still no more working).
So while the (reduced to the minimum parameters) URL https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=proce... shows (an empty) graph, https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=proce... gives a 1x1 pixel.
With regards to the empty graph, https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=proce... is not empty, it just shows that there is no more data since the 4th of November (when I updated the servers from 4.3.21 to 4.3.22-rc2).
And indeed, in /var/lib/xymon/rrd/zwoelfi/, not all files have been updated anymore since 4th of November:
ls -l *proc*
-rw-r--r-- 1 xymon xymon 19640 Nov 4 15:40 processes.apache2.rrd -rw-r--r-- 1 xymon xymon 19640 Nov 4 15:40 processes.automount.rrd -rw-r--r-- 1 xymon xymon 19640 Nov 4 15:40 processes.stress.rrd -rw-r--r-- 1 xymon xymon 19640 Nov 11 13:09 procs.rrd
Summary
So there seem to be two issues with 4.3.22:
The graph in the procs check's page isn't displayed properly.
Either
- "nostale" should get a value in the page/template,
- or the parsing of the "nostale" parameter without value in the showgraph CGI
should be fixed. This sounds rather easy, but I'm not sure which variant is the expected one.
For some reason the processes.*.rrd files defined by "TRACK=" in analysis.cfg no more get updated.
Here I currently have no good idea where this comes from. Maybe from one of the NCV-related changes. At least I found no configuration change (be it local or in the defaults/templates) which could have triggered this issue.
Kind regards, Axel Beckert-- Axel Beckert <beckert at phys.ethz.ch> support: +41 44 633 26 68 IT Services Group, HPT H 6 voice: +41 44 633 41 89 Departement of Physics, ETH Zurich CH-8093 Zurich, Switzerland http://nic.phys.ethz.ch/