Two procs/processes graph issues after server upgrade from 4.3.21 to 4.3.22-rc2
Hi,
[TL;DR: See Summary at the end.]
I'm slowly running out of ideas with the following issue which has been noticed after I rolled out 4.3.22-rc2 on our two monitoring servers (still running the servers on 4.3.22-rc2 at the moment):
The graph on https://xymon.phys.ethz.ch/xymon-cgi/svcstatus.sh?HOST=zwoelfi&SERVICE=procs is no more there, because https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=proce... returns only an 1x1 pixel PMG. The same happens on the second (independent, not slave) server, too.
(No version changes on the affected clients. Those I checked have either 4.3.0-beta2 from Debian 7 or 4.3.17 from Debian 8.)
I've found the following messages upon reloading the above URL in Apache's error.log:
2015-11-11 12:32:38.839801 Sendto failed: Connection refused 2015-11-11 12:32:38.839853 Sendto failed: Connection refused 2015-11-11 12:32:38.839871 Sendto failed: Connection refused
I've found http://lists.xymon.com/archive/2015-February/041189.html with these messages, stopped the xymon service, removed all left over rrdctl.* files from /var/lib/xymon/tmp/ and started the xymon service again.
Result is: I still only get an 1x1 pixel PNG, but the error messages are gone, i.e. the issues are likely unrelated as they were in the mailing list posting above.
Then again on https://xymon.phys.ethz.ch/xymon-cgi/svcstatus.sh?HOST=zwoelfi&SERVICE=trend... the "Process counts" graph is there (but seems not working):
https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=proce...
The difference between this and the first URL are (besides the time stamps): The first URL has nostale (without value) and color=green as additional query string parameters, and the second URL has instead first=1 and count=4 as query string parameters.
As soon as I remove the "nostale" without a value or add a value like e.g. "nostale=1", the graph is back again (but still no more working).
So while the (reduced to the minimum parameters) URL https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=proce... shows (an empty) graph, https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=proce... gives a 1x1 pixel.
With regards to the empty graph, https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=proce... is not empty, it just shows that there is no more data since the 4th of November (when I updated the servers from 4.3.21 to 4.3.22-rc2).
And indeed, in /var/lib/xymon/rrd/zwoelfi/, not all files have been updated anymore since 4th of November:
ls -l *proc*
-rw-r--r-- 1 xymon xymon 19640 Nov 4 15:40 processes.apache2.rrd -rw-r--r-- 1 xymon xymon 19640 Nov 4 15:40 processes.automount.rrd -rw-r--r-- 1 xymon xymon 19640 Nov 4 15:40 processes.stress.rrd -rw-r--r-- 1 xymon xymon 19640 Nov 11 13:09 procs.rrd
Summary
So there seem to be two issues with 4.3.22:
The graph in the procs check's page isn't displayed properly.
Either
- "nostale" should get a value in the page/template,
- or the parsing of the "nostale" parameter without value in the showgraph CGI
should be fixed. This sounds rather easy, but I'm not sure which variant is the expected one.
For some reason the processes.*.rrd files defined by "TRACK=" in analysis.cfg no more get updated.
Here I currently have no good idea where this comes from. Maybe from one of the NCV-related changes. At least I found no configuration change (be it local or in the defaults/templates) which could have triggered this issue.
Kind regards, Axel Beckert-- Axel Beckert <beckert at phys.ethz.ch> support: +41 44 633 26 68 IT Services Group, HPT H 6 voice: +41 44 633 41 89 Departement of Physics, ETH Zurich CH-8093 Zurich, Switzerland http://nic.phys.ethz.ch/
Hi,
On Wed, Nov 11, 2015 at 06:54:13PM +0100, Axel Beckert wrote:
I'm slowly running out of ideas with the following issue which has been noticed after I rolled out 4.3.22-rc2 on our two monitoring servers (still running the servers on 4.3.22-rc2 at the moment):
JFTR: Upgrading them to the final 4.3.22 didn't make a difference so far.
Kind regards, Axel Beckert
-- Axel Beckert <beckert at phys.ethz.ch> support: +41 44 633 26 68 IT Services Group, HPT H 6 voice: +41 44 633 41 89 Departement of Physics, ETH Zurich CH-8093 Zurich, Switzerland http://nic.phys.ethz.ch/
Thanks, I believe this is the same issue that Steven was facing. I'm looking into it now. It's definitely not present in 4.3.21.
-jc
On 11/11/2015 10:39 AM, Axel Beckert wrote:
Hi,
On Wed, Nov 11, 2015 at 06:54:13PM +0100, Axel Beckert wrote:
I'm slowly running out of ideas with the following issue which has been noticed after I rolled out 4.3.22-rc2 on our two monitoring servers (still running the servers on 4.3.22-rc2 at the moment): JFTR: Upgrading them to the final 4.3.22 didn't make a difference so far.
Kind regards, Axel Beckert
Hi,
On Wed, Nov 11, 2015 at 12:20:37PM -0800, Japheth Cleaver wrote:
Thanks, I believe this is the same issue that Steven was facing.
Yeah, saw it just after I've sent my mail.
I'm looking into it now. It's definitely not present in 4.3.21.
Sorry for not reporting it earlier (i.e. before the 4.3.22 final release). A coworker made me aware of it on Thursday last week, but I didn't come that far with examining the issue on that day -- and I was not in the office on Friday. I also first expected a configuration issues caused by merging our config with the changed 4.3.22 config (changes around GRAPHS_*, etc.)...
On Wed, Nov 11, 2015 at 03:53:48PM -0800, Japheth Cleaver wrote:
The chief problem was that TRACK and OPTIONAL seemed to not be tracked as options to a test as a result of r7683 and r7686 (on some platforms).
Our servers are both x86_64 (called amd64 in Debian) btw.
Secondarily, 'nostale' is the default on svcstatus.sh pages, wherein it will eventually not display an old RRD page on the status -- in this case, because it hadn't been updated recently.
Ah! So that's a followup issue. Didn't think of that although it seems obvious now due to the name "no stale".
I'm not sure how I feel about the latter issue, but it's been that way for a while.
Well, the main issue is that there was no graph data displayed. If fixing the RRD update also brings the graphs back, that's fine.
Nevertheless, I'd definitely prefer an empty graph in such a case than an 1x1-pixel/invisible/no graph.
If there's suddenly no graph anymore, I'd generally expect a bug in the software. (Yes, I said the opposite above, but that's basically because there was a lot of configuration to merge -- which tends to be error-prone. But then again, I that general feeling was right in this case. :-)
If there's an empty graph, I'd expected issues with reporting in general and would consider it a local issue which my monitoring reported properly.
I believe the included patch fixes the main issue; I'm testing now (as 4.3.22-5 in http://terabithia.org/rpms/xymon/testing/el6/x86_64/).
This is enough to warrant a 4.3.23 release shortly, upon confirmation.
Will apply that patch to the 4.3.22 Debian packages, and roll them out locally. Will report back as soon as I can confirm or deny that the fix helps.
Kind regards, Axel Beckert
-- Axel Beckert <beckert at phys.ethz.ch> support: +41 44 633 26 68 IT Services Group, HPT H 6 voice: +41 44 633 41 89 Departement of Physics, ETH Zurich CH-8093 Zurich, Switzerland http://nic.phys.ethz.ch/
Hi,
sorry, wasn't able to test it properly earlier today.
On Thu, Nov 12, 2015 at 11:09:17AM +0100, Axel Beckert wrote:
I believe the included patch fixes the main issue; I'm testing now (as 4.3.22-5 in http://terabithia.org/rpms/xymon/testing/el6/x86_64/).
This is enough to warrant a 4.3.23 release shortly, upon confirmation.
Will apply that patch to the 4.3.22 Debian packages, and roll them out locally. Will report back as soon as I can confirm or deny that the fix helps.
Indeed the fix seem to help:
https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=proce...
Thanks for the patch!
Kind regards, Axel Beckert
-- Axel Beckert <beckert at phys.ethz.ch> support: +41 44 633 26 68 IT Services Group, HPT H 6 voice: +41 44 633 41 89 Departement of Physics, ETH Zurich CH-8093 Zurich, Switzerland http://nic.phys.ethz.ch/
On 11/12/2015 8:55 AM, Axel Beckert wrote:
Hi,
sorry, wasn't able to test it properly earlier today.
On Thu, Nov 12, 2015 at 11:09:17AM +0100, Axel Beckert wrote:
I believe the included patch fixes the main issue; I'm testing now (as 4.3.22-5 in http://terabithia.org/rpms/xymon/testing/el6/x86_64/).
This is enough to warrant a 4.3.23 release shortly, upon confirmation. Will apply that patch to the 4.3.22 Debian packages, and roll them out locally. Will report back as soon as I can confirm or deny that the fix helps. Indeed the fix seem to help:
https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=proce...
Thanks for the patch!
Kind regards, Axel Beckert
Perfect. That works for me, then. I'll push a 4.3.23 shortly.
Regards, -jc
There's a primary and a secondary issue here.
The chief problem was that TRACK and OPTIONAL seemed to not be tracked as options to a test as a result of r7683 and r7686 (on some platforms). Secondarily, 'nostale' is the default on svcstatus.sh pages, wherein it will eventually not display an old RRD page on the status -- in this case, because it hadn't been updated recently. I'm not sure how I feel about the latter issue, but it's been that way for a while.
I believe the included patch fixes the main issue; I'm testing now (as 4.3.22-5 in http://terabithia.org/rpms/xymon/testing/el6/x86_64/).
This is enough to warrant a 4.3.23 release shortly, upon confirmation.
-jc
On 11/11/2015 9:54 AM, Axel Beckert wrote:
Hi,
[TL;DR: See Summary at the end.]
I'm slowly running out of ideas with the following issue which has been noticed after I rolled out 4.3.22-rc2 on our two monitoring servers (still running the servers on 4.3.22-rc2 at the moment):
The graph on https://xymon.phys.ethz.ch/xymon-cgi/svcstatus.sh?HOST=zwoelfi&SERVICE=procs is no more there, because https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=proce... returns only an 1x1 pixel PMG. The same happens on the second (independent, not slave) server, too.
(No version changes on the affected clients. Those I checked have either 4.3.0-beta2 from Debian 7 or 4.3.17 from Debian 8.)
I've found the following messages upon reloading the above URL in Apache's error.log:
2015-11-11 12:32:38.839801 Sendto failed: Connection refused 2015-11-11 12:32:38.839853 Sendto failed: Connection refused 2015-11-11 12:32:38.839871 Sendto failed: Connection refused
I've found http://lists.xymon.com/archive/2015-February/041189.html with these messages, stopped the xymon service, removed all left over rrdctl.* files from /var/lib/xymon/tmp/ and started the xymon service again.
Result is: I still only get an 1x1 pixel PNG, but the error messages are gone, i.e. the issues are likely unrelated as they were in the mailing list posting above.
Then again on https://xymon.phys.ethz.ch/xymon-cgi/svcstatus.sh?HOST=zwoelfi&SERVICE=trend... the "Process counts" graph is there (but seems not working):
https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=proce...
The difference between this and the first URL are (besides the time stamps): The first URL has nostale (without value) and color=green as additional query string parameters, and the second URL has instead first=1 and count=4 as query string parameters.
As soon as I remove the "nostale" without a value or add a value like e.g. "nostale=1", the graph is back again (but still no more working).
So while the (reduced to the minimum parameters) URL https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=proce... shows (an empty) graph, https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=proce... gives a 1x1 pixel.
With regards to the empty graph, https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=proce... is not empty, it just shows that there is no more data since the 4th of November (when I updated the servers from 4.3.21 to 4.3.22-rc2).
And indeed, in /var/lib/xymon/rrd/zwoelfi/, not all files have been updated anymore since 4th of November:
ls -l *proc*
-rw-r--r-- 1 xymon xymon 19640 Nov 4 15:40 processes.apache2.rrd -rw-r--r-- 1 xymon xymon 19640 Nov 4 15:40 processes.automount.rrd -rw-r--r-- 1 xymon xymon 19640 Nov 4 15:40 processes.stress.rrd -rw-r--r-- 1 xymon xymon 19640 Nov 11 13:09 procs.rrd
Summary
So there seem to be two issues with 4.3.22:
The graph in the procs check's page isn't displayed properly.
Either
- "nostale" should get a value in the page/template,
- or the parsing of the "nostale" parameter without value in the showgraph CGI
should be fixed. This sounds rather easy, but I'm not sure which variant is the expected one.
For some reason the processes.*.rrd files defined by "TRACK=" in analysis.cfg no more get updated.
Here I currently have no good idea where this comes from. Maybe from one of the NCV-related changes. At least I found no configuration change (be it local or in the defaults/templates) which could have triggered this issue.
Kind regards, Axel Beckert
participants (2)
-
beckert@phys.ethz.ch
-
cleaver@terabithia.org