I'm working on improving my Xymon configuration to reduce the number of false alerts that we get. In particular, memory monitoring is a bit of a problem so I'm hoping someone will be able to offer some advice.
At the moment, Xymon is set up with something like:
MEMPHYS 100 101 MEMSWAP 20 40 MEMACT 95 97
I pretty much don't care about MEMPHYS. The problem with MEMSWAP and MEMACT is that they work independently or each other - i.e. the above will give me an alert if > 97% of the RAM is used OR > 40% of swap is used.
However, this results in warnings for systems that have a lot of idle data in memory. The Linux kernel will page out idle data (increasing swap usage and reducing RAM usage) and use that space for buffers/caches, and this is a very sensible strategy. Unfortunately, then Xymon comes along and notices that there's lots of swap in use and throws an alert, even though there's plenty of RAM free.
Basically, I don't care that a machine is 4GB into swap if it has 5GB of free ram - that isn't a problem, it just means there's quite a lot of idle data that the kernel has decided can be paged out. I do care if it's 4GB into swap and only has 0.5GB of free RAM since this would indicate that it's actually short of memory.
What I really need is to warn if > x% of the RAM is used AND > y% of swap is used - is there a way to do that?
Thanks.
--
- Steve Hill Technical Director Opendium Limited http://www.opendium.com
Direct contacts: Instant messager: xmpp:steve at opendium.com Email: steve at opendium.com Phone: sip:steve at opendium.com
Sales / enquiries contacts: Email: sales at opendium.com Phone: +44-1792-824568 / sip:sales at opendium.com
Support contacts: Email: support at opendium.com Phone: +44-1792-825748 / sip:support at opendium.com