Forgot to reply all.
On 2015-04-14 10:00 am, Mike Burger wrote:
On 2015-04-14 7:56 am, Steve Hill wrote:
I'm working on improving my Xymon configuration to reduce the number of false alerts that we get. In particular, memory monitoring is a bit of a problem so I'm hoping someone will be able to offer some advice.
At the moment, Xymon is set up with something like:
MEMPHYS 100 101 MEMSWAP 20 40 MEMACT 95 97
I pretty much don't care about MEMPHYS. The problem with MEMSWAP and MEMACT is that they work independently or each other - i.e. the above will give me an alert if > 97% of the RAM is used OR > 40% of swap is used.
However, this results in warnings for systems that have a lot of idle data in memory. The Linux kernel will page out idle data (increasing swap usage and reducing RAM usage) and use that space for buffers/caches, and this is a very sensible strategy. Unfortunately, then Xymon comes along and notices that there's lots of swap in use and throws an alert, even though there's plenty of RAM free.
Basically, I don't care that a machine is 4GB into swap if it has 5GB of free ram - that isn't a problem, it just means there's quite a lot of idle data that the kernel has decided can be paged out. I do care if it's 4GB into swap and only has 0.5GB of free RAM since this would indicate that it's actually short of memory.
What I really need is to warn if > x% of the RAM is used AND > y% of swap is used - is there a way to do that?
I'll say that I've never run into this...I've never had a system swap memory out to disk unless active memory was utilized at a high percentage...in either AIX or Linux.
In AIX, there is some sort of algorithm in place where, if a process' memory has been swapped out and then swapped back in, the memory manager holds onto the paging space until either something else needs paging space or the previously swapped out process ends, but I don't think I've ever seen a situation in Linux where idle memory pages were swapped to disk and physical/active memory had some large percentage free.
Now, on the other side of this, to take a stab at the question, I'd wager that, at present, you'd need to script such a test/alert..but I would agree that it would be useful to be able to set an "alarm if this or this" or an "alarm if this and this" type scenario. At present, the only tests I can think of that allow this, "out of the box" are the process monitors, where you can set minimum and maximum thresholds.
-- Mike Burger http://www.bubbanfriends.org
"It's always suicide-mission this, save-the-planet that. No one ever just stops by to say 'hi' anymore." --Colonel Jack O'Neill, SG1