Xymon reports excessive memory usage on 1 SLES 11 host
Hi,
I have a weird problem on one of my SLES 11 hosts:
Tue Dec 14 03:00:52 CET 2010 - Memory CRITICAL
Memory Used Total Percentage
[cid:image001.gif at 01CB9B73.C1882320] Physical 3803M 3829M 99%
[cid:image002.gif at 01CB9B73.C1882320] Actual 17592186044128M 3829M459445966156%
[cid:image001.gif at 01CB9B73.C1882320] Swap 0M 2055M 0%
as you can see, it reports an excessive memory usage, and hence turns red and does this 2 times each night. It's the 4.2.3 client compiled under SLES 11.
Any ideas?
Regards,
Carl Melgaard
On Tue, 14 Dec 2010 09:46:20 +0100, Carl Melgaard wrote:
I have a weird problem on one of my SLES 11 hosts:
Tue Dec 14 03:00:52 CET 2010 - Memory CRITICAL
Memory Used Total Percentage
Physical 3803M 3829M 99% Actual 17592186044128M 3829M 459445966156% Swap 0M 2055M 0%
as you can see, it reports an excessive memory usage
Could you show me the client data behind this report? Assuming you have the "hostdata" task running, it should be available from the historical status log via the "Client data available" link near the bottom of the page.
It is the "[free]" section that is interesting for the memory report.
Also, what version of Xymon are you running on your Xymon server?
Regards, Henrik
Hi,
as you can see, it reports an excessive memory usage Could you show me the client data behind this report? Assuming you have the "hostdata" task running, it should be available from the historical status log via the "Client data available" link near the bottom of the page.
It is the "[free]" section that is interesting for the memory report.
Here are two separate [free] sections (from history), that resulted in red alerts:
[free] total used free shared buffers cached Mem: 3921396 3894772 26624 0 302132 3887292 -/+ buffers/cache: 18014398509187332 4216048 Swap: 2104472 904 2103568
- and
[free] total used free shared buffers cached Mem: 3921396 3851576 69820 0 319192 3903296 -/+ buffers/cache: 18014398509111072 4292308 Swap: 2104472 904 2103568
- two different days.
Also, what version of Xymon are you running on your Xymon server?
Im actually running 4.4.0-1 on the server, as I was bold when I implemented Xymon originally.
Regards,
Carl Melgaard
I've seen the same strange reports from 4.2.0. Solaris 10 zones on x86 are the reporting clients, example: red Physical 4294953114M 16384M 4294967210%
Since it's not a large scale issue for me I've not dedicated any time to looking into it. For me, it's a new issue on recent zone deployments only (in the last year or so), so I figured some version of shell tool xyz may be the culprit. I can look further or provide additional data if needed. The point is that from 4.2.0 to 4.4 the issue presents itself, but it's new for me so I'm not sure it's hobbit.
Regards,
Tim
From: Carl Melgaard [Carl.Melgaard at STAB.RM.DK] Sent: Tuesday, December 14, 2010 2:16 AM To: 'xymon at xymon.com' Subject: SV: [xymon] Xymon reports excessive memory usage on 1 SLES 11 host
Hi,
as you can see, it reports an excessive memory usage Could you show me the client data behind this report? Assuming you have the "hostdata" task running, it should be available from the historical status log via the "Client data available" link near the bottom of the page.
It is the "[free]" section that is interesting for the memory report.
Here are two separate [free] sections (from history), that resulted in red alerts:
[free] total used free shared buffers cached Mem: 3921396 3894772 26624 0 302132 3887292 -/+ buffers/cache: 18014398509187332 4216048 Swap: 2104472 904 2103568
- and
[free] total used free shared buffers cached Mem: 3921396 3851576 69820 0 319192 3903296 -/+ buffers/cache: 18014398509111072 4292308 Swap: 2104472 904 2103568
- two different days.
Also, what version of Xymon are you running on your Xymon server?
Im actually running 4.4.0-1 on the server, as I was bold when I implemented Xymon originally.
Regards,
Carl Melgaard
To unsubscribe from the xymon list, send an e-mail to xymon-unsubscribe at xymon.com
That looks like output from the 'free' command. Try going to the server and entering
free
and see what it says. Xymon doesn't alter the output at all, it just passes on whatever comes out. Here's the relevant part of xymonclient-linux.sh, with context:
echo "[mount]"
mount
echo "[free]"
free
echo "[ifconfig]"
/sbin/ifconfig
Ralph Mitchell
On Tue, Dec 14, 2010 at 5:16 AM, Carl Melgaard <Carl.Melgaard at stab.rm.dk>wrote:
Hi,
as you can see, it reports an excessive memory usage Could you show me the client data behind this report? Assuming you have the "hostdata" task running, it should be available from the historical status log via the "Client data available" link near the bottom of the page.
It is the "[free]" section that is interesting for the memory report.
Here are two separate [free] sections (from history), that resulted in red alerts:
[free] total used free shared buffers cached Mem: 3921396 3894772 26624 0 302132 3887292 -/+ buffers/cache: 18014398509187332 4216048 Swap: 2104472 904 2103568
- and
[free] total used free shared buffers cached Mem: 3921396 3851576 69820 0 319192 3903296 -/+ buffers/cache: 18014398509111072 4292308 Swap: 2104472 904 2103568
- two different days.
Also, what version of Xymon are you running on your Xymon server?
Im actually running 4.4.0-1 on the server, as I was bold when I implemented Xymon originally.
Regards,
Carl Melgaard
To unsubscribe from the xymon list, send an e-mail to xymon-unsubscribe at xymon.com
Hi,
"free" just gives the normal output, but sometimes it apparently reports bogus data on 1 host. Shrug. And it's the same version running across all the hosts. Thanks for looking into it tho.
Regards,
Carl Melgaard
Fra: Ralph Mitchell [mailto:ralphmitchell at gmail.com] Sendt: 14. december 2010 18:57 Til: xymon at xymon.com Emne: Re: [xymon] Xymon reports excessive memory usage on 1 SLES 11 host
That looks like output from the 'free' command. Try going to the server and entering
free
and see what it says. Xymon doesn't alter the output at all, it just passes on whatever comes out. Here's the relevant part of xymonclient-linux.sh, with context:
echo "[mount]"
mount
echo "[free]"
free
echo "[ifconfig]"
/sbin/ifconfig
Ralph Mitchell
On Tue, Dec 14, 2010 at 5:16 AM, Carl Melgaard <Carl.Melgaard at stab.rm.dk<mailto:Carl.Melgaard at stab.rm.dk>> wrote: Hi,
as you can see, it reports an excessive memory usage Could you show me the client data behind this report? Assuming you have the "hostdata" task running, it should be available from the historical status log via the "Client data available" link near the bottom of the page.
It is the "[free]" section that is interesting for the memory report. Here are two separate [free] sections (from history), that resulted in red alerts:
[free] total used free shared buffers cached Mem: 3921396 3894772 26624 0 302132 3887292 -/+ buffers/cache: 18014398509187332 4216048 Swap: 2104472 904 2103568
- and
[free] total used free shared buffers cached Mem: 3921396 3851576 69820 0 319192 3903296 -/+ buffers/cache: 18014398509111072 4292308 Swap: 2104472 904 2103568
- two different days.
Also, what version of Xymon are you running on your Xymon server? Im actually running 4.4.0-1 on the server, as I was bold when I implemented Xymon originally.
Regards,
Carl Melgaard
To unsubscribe from the xymon list, send an e-mail to xymon-unsubscribe at xymon.com<mailto:xymon-unsubscribe at xymon.com>
On Tue, 14 Dec 2010 11:16:20 +0100, Carl Melgaard wrote:
Could you show me the client data behind this report? Assuming you have the "hostdata" task running, it should be available from the historical status log via the "Client data available" link near the bottom of the page.
It is the "[free]" section that is interesting for the memory report.
Here are two separate [free] sections (from history), that resulted in red alerts:
[free] total used free shared buffers cached Mem: 3921396 3894772 26624 0 302132 3887292 -/+ buffers/cache: 18014398509187332 4216048 Swap: 2104472 904 2103568
- and
[free] total used free shared buffers cached Mem: 3921396 3851576 69820 0 319192 3903296 -/+ buffers/cache: 18014398509111072 4292308 Swap: 2104472 904 2103568
I have to plead "not guilty" on behalf of Xymon, then. The data reported by "free" in the "+/- buffers/cache" line is obviously bogus - but it is what Xymon uses for the "Actual" memory calculations. If Xymon gets bogus data, then you will also have bogus results.
Regards, Henrik
What does the Solaris client use to get this data? vmstat? (free is not a native solaris tool).
From: Henrik Størner [henrik at hswn.dk] Sent: Tuesday, December 14, 2010 1:29 PM To: xymon at xymon.com Subject: Re: SV: [xymon] Xymon reports excessive memory usage on 1 SLES 11 host
On Tue, 14 Dec 2010 11:16:20 +0100, Carl Melgaard wrote:
Could you show me the client data behind this report? Assuming you have the "hostdata" task running, it should be available from the historical status log via the "Client data available" link near the bottom of the page.
It is the "[free]" section that is interesting for the memory report.
Here are two separate [free] sections (from history), that resulted in red alerts:
[free] total used free shared buffers cached Mem: 3921396 3894772 26624 0 302132 3887292 -/+ buffers/cache: 18014398509187332 4216048 Swap: 2104472 904 2103568
- and
[free] total used free shared buffers cached Mem: 3921396 3851576 69820 0 319192 3903296 -/+ buffers/cache: 18014398509111072 4292308 Swap: 2104472 904 2103568
I have to plead "not guilty" on behalf of Xymon, then. The data reported by "free" in the "+/- buffers/cache" line is obviously bogus - but it is what Xymon uses for the "Actual" memory calculations. If Xymon gets bogus data, then you will also have bogus results.
Regards, Henrik
To unsubscribe from the xymon list, send an e-mail to xymon-unsubscribe at xymon.com
On Tue, 14 Dec 2010 13:42:21 -0800, Tim McCloskey wrote:
What does the Solaris client use to get this data? vmstat? (free is not a native solaris tool).
Each OS has their own way of reporting memory utilisation - it is completely non-standard, and the one part of the Xymon client that requires the most code for each new OS!
Specifically for Solaris, Xymon uses prtconf to determine how much memory is installed, and vmstat to determine how much is being used. "swap -s" was used for determining how much swap was being used, but earlier today I committed an update so we will now use "swap -l" instead.
Regards, Henrik
Henrik,
Thanks for the speedy answer. I had seen this in fun in hobbitd/client/$clients.c. You must enjoy porting that part of the project each time some OS makes a change :)
Trivia: On Solaris 10 zones prtconf can get the installed "Memory size:" But anything further (like prtdiag) will fail.
System Configuration: Sun Microsystems i86pc Memory size: 32768 Megabytes System Peripherals (Software Nodes):
prtconf: devinfo facility not available
Regards,
Tim
From: Henrik Størner [henrik at hswn.dk] Sent: Tuesday, December 14, 2010 1:50 PM To: xymon at xymon.com Subject: Re: SV: [xymon] Xymon reports excessive memory usage on 1 SLES 11 host
On Tue, 14 Dec 2010 13:42:21 -0800, Tim McCloskey wrote:
What does the Solaris client use to get this data? vmstat? (free is not a native solaris tool).
Each OS has their own way of reporting memory utilisation - it is completely non-standard, and the one part of the Xymon client that requires the most code for each new OS!
Specifically for Solaris, Xymon uses prtconf to determine how much memory is installed, and vmstat to determine how much is being used. "swap -s" was used for determining how much swap was being used, but earlier today I committed an update so we will now use "swap -l" instead.
Regards, Henrik
To unsubscribe from the xymon list, send an e-mail to xymon-unsubscribe at xymon.com
Not much point in doing memory monitoring on a Solaris sparse zone. Might as well put a check in the client script to not collect memory info for a sparse zone with capped memory. See here for more info. http://www.xymon.com/archive/2010/02/msg00213.html
Regards Vernon
On Wed, Dec 15, 2010 at 6:11 AM, Tim McCloskey <tm at freedom.com> wrote:
Henrik,
Thanks for the speedy answer. I had seen this in fun in hobbitd/client/$clients.c. You must enjoy porting that part of the project each time some OS makes a change :)
Trivia: On Solaris 10 zones prtconf can get the installed "Memory size:" But anything further (like prtdiag) will fail.
System Configuration: Sun Microsystems i86pc Memory size: 32768 Megabytes System Peripherals (Software Nodes):
prtconf: devinfo facility not available
Regards,
Tim
From: Henrik Størner [henrik at hswn.dk] Sent: Tuesday, December 14, 2010 1:50 PM To: xymon at xymon.com Subject: Re: SV: [xymon] Xymon reports excessive memory usage on 1 SLES 11 host
On Tue, 14 Dec 2010 13:42:21 -0800, Tim McCloskey wrote:
What does the Solaris client use to get this data? vmstat? (free is not a native solaris tool).
Each OS has their own way of reporting memory utilisation - it is completely non-standard, and the one part of the Xymon client that requires the most code for each new OS!
Specifically for Solaris, Xymon uses prtconf to determine how much memory is installed, and vmstat to determine how much is being used. "swap -s" was used for determining how much swap was being used, but earlier today I committed an update so we will now use "swap -l" instead.
Regards, Henrik
To unsubscribe from the xymon list, send an e-mail to xymon-unsubscribe at xymon.com
To unsubscribe from the xymon list, send an e-mail to xymon-unsubscribe at xymon.com
Thanks Vernon. I know that what we see is not 100% accurate but for the majority of my zones they are only used for one purpose (only one child zone per phys server). So, for me, the data provided from the zone is useful. We don't really measure exact numbers, more of trends and watching for quick spikes.
From: Vernon Everett [everett.vernon at gmail.com] Sent: Tuesday, December 14, 2010 7:16 PM To: xymon at xymon.com Subject: Re: SV: [xymon] Xymon reports excessive memory usage on 1 SLES 11 host
Not much point in doing memory monitoring on a Solaris sparse zone. Might as well put a check in the client script to not collect memory info for a sparse zone with capped memory. See here for more info. http://www.xymon.com/archive/2010/02/msg00213.html
Regards Vernon
On Wed, Dec 15, 2010 at 6:11 AM, Tim McCloskey <tm at freedom.com<mailto:tm at freedom.com>> wrote: Henrik,
Thanks for the speedy answer. I had seen this in fun in hobbitd/client/$clients.c. You must enjoy porting that part of the project each time some OS makes a change :)
Trivia: On Solaris 10 zones prtconf can get the installed "Memory size:" But anything further (like prtdiag) will fail.
System Configuration: Sun Microsystems i86pc Memory size: 32768 Megabytes System Peripherals (Software Nodes):
prtconf: devinfo facility not available
Regards,
Tim
From: Henrik Størner [henrik at hswn.dk<mailto:henrik at hswn.dk>] Sent: Tuesday, December 14, 2010 1:50 PM To: xymon at xymon.com<mailto:xymon at xymon.com> Subject: Re: SV: [xymon] Xymon reports excessive memory usage on 1 SLES 11 host
On Tue, 14 Dec 2010 13:42:21 -0800, Tim McCloskey wrote:
What does the Solaris client use to get this data? vmstat? (free is not a native solaris tool).
Each OS has their own way of reporting memory utilisation - it is completely non-standard, and the one part of the Xymon client that requires the most code for each new OS!
Specifically for Solaris, Xymon uses prtconf to determine how much memory is installed, and vmstat to determine how much is being used. "swap -s" was used for determining how much swap was being used, but earlier today I committed an update so we will now use "swap -l" instead.
Regards, Henrik
To unsubscribe from the xymon list, send an e-mail to xymon-unsubscribe at xymon.com<mailto:xymon-unsubscribe at xymon.com>
To unsubscribe from the xymon list, send an e-mail to xymon-unsubscribe at xymon.com<mailto:xymon-unsubscribe at xymon.com>
[free] total used free shared buffers cached Mem: 3921396 3851576 69820 0 319192 3903296 -/+ buffers/cache: 18014398509111072 4292308 Swap: 2104472 904 2103568 I have to plead "not guilty" on behalf of Xymon, then. The data reported by "free" in the "+/- buffers/cache" line is obviously bogus - but it is what Xymon uses for the "Actual" memory calculations. If Xymon gets bogus data, then you will also have bogus results.
Yes, thats understandable. Is there any way I can NOT trigger notification on these bogus alerts? Disable the MEMACT check for that host?
Regards,
Carl Melgaard
In <3BD667CCFBD0D04CA2BC3D57D01B64264CA04A6083 at HORMXB103VM1.onerm.dk> Carl Melgaard <Carl.Melgaard at STAB.RM.DK> writes:
[free] total used free shared buffers cached Mem: 3921396 3851576 69820 0 319192 3903296= =20 -/+ buffers/cache: 18014398509111072 4292308=20 Swap: 2104472 904 2103568 I have to plead "not guilty" on behalf of Xymon, then. The data reported=20 by "free" in the "+/- buffers/cache" line is obviously bogus - but it is what Xymon uses for the "Actual" memory calculations. If Xymon gets=20 bogus data, then you will also have bogus results.
Yes, thats understandable. Is there any way I can NOT trigger notification = on these bogus alerts? Disable the MEMACT check for that host?
Not in the code You have. But it seems reasonable to add some sort of sanity check in the memory-status handler, so I've done that to only act on the data when the percent-used is at most 100%. So you will a) not get alerts from the bogus data, and b) you can disable all memory alerts by setting a threshold greater than 100. Patch below should apply to 4.3.0-beta3. Regards, Henrik Index: xymond/xymond_client.c =================================================================== --- xymond/xymond_client.c (revision 6590) +++ xymond/xymond_client.c (working copy) @@ -883,17 +883,19 @@ get_memory_thresholds(hinfo, clientclass, &physyellow, &physred, &swapyellow, &swapred, &actyellow, &actred); memphyspct = (memphystotal > 0) ? ((100 * memphysused) / memphystotal) : 0; - if (memphyspct > physyellow) physcolor = COL_YELLOW; - if (memphyspct > physred) physcolor = COL_RED; + if (memphyspct <= 100) { + if (memphyspct > physyellow) physcolor = COL_YELLOW; + if (memphyspct > physred) physcolor = COL_RED; + } - if (memswapused != -1) { - memswappct = (memswaptotal > 0) ? ((100 * memswapused) / memswaptotal) : 0; + if (memswapused != -1) memswappct = (memswaptotal > 0) ? ((100 * memswapused) / memswaptotal) : 0; + if (memswappct <= 100) { if (memswappct > swapyellow) swapcolor = COL_YELLOW; if (memswappct > swapred) swapcolor = COL_RED; } - if (memactused != -1) { - memactpct = (memphystotal > 0) ? ((100 * memactused) / memphystotal) : 0; + if (memactused != -1) memactpct = (memphystotal > 0) ? ((100 * memactused) / memphystotal) : 0; + if (memactpct <= 100) { if (memactpct > actyellow) actcolor = COL_YELLOW; if (memactpct > actred) actcolor = COL_RED; } @@ -927,14 +929,24 @@ addtostatus(msgline); if (memactused != -1) { - sprintf(msgline, "&%s %-12s%11luM%11luM%11lu%%\n", - colorname(actcolor), "Actual", memactused, memphystotal, memactpct); + if (memactpct <= 100) + sprintf(msgline, "&%s %-12s%11luM%11luM%11lu%%\n", + colorname(actcolor), "Actual", memactused, memphystotal, memactpct); + else + sprintf(msgline, "&%s %-12s%11luM%11luM%11lu%% - invalid data\n", + colorname(COL_CLEAR), "Actual", memactused, memphystotal, 0); + addtostatus(msgline); } if (memswapused != -1) { - sprintf(msgline, "&%s %-12s%11luM%11luM%11lu%%\n", - colorname(swapcolor), "Swap", memswapused, memswaptotal, memswappct); + if (memswappct <= 100) + sprintf(msgline, "&%s %-12s%11luM%11luM%11lu%%\n", + colorname(swapcolor), "Swap", memswapused, memswaptotal, memswappct); + else + sprintf(msgline, "&%s %-12s%11luM%11luM%11lu%% - invalid data\n", + colorname(COL_CLEAR), "Swap", memswapused, memswaptotal, 0); + addtostatus(msgline); } if (fromline && !localmode) addtostatus(fromline);
participants (5)
-
Carl.Melgaard@STAB.RM.DK
-
everett.vernon@gmail.com
-
henrik@hswn.dk
-
ralphmitchell@gmail.com
-
tm@freedom.com