Good point!
Thanks for finding my message and replying. That gives me something to try.
- Chris
From: Jeremy Laidman <jeremy at laidman.org> Sent: Tuesday, August 21, 2018 8:26 AM To: Seip, Christopher (HPN SIS team) <chris.seip at hpe.com> Cc: xymon at xymon.com Subject: Re: [Xymon] Setting thresholds in analysis.cfg
Chris
I think this is the key part of the man page:
HOST=targetstring Rule matching a host by the hostname. "targetstring" is either a comma-separated list of hostnames (from the hosts.cfg file), "*" to indicate "all hosts", or a Perl-compatible regular expression.
Are your host definitions comma-separated lists, or PCREs? They can't be both.
So none of your hosts match, and the DEFAULT stanza is the one that applies.
J
On 9 June 2018 at 02:43, Seip, Christopher (HPN SIS team) <chris.seip at hpe.com<mailto:chris.seip at hpe.com>> wrote: I could a hand getting the basics of analysis.cfg worked out, please. Here's mine:
egrep -v '^#' analysis.cfg
HOST=%swnfs06.rose.rdlabs.hpecorp.net<http://swnfs06.rose.rdlabs.hpecorp.net>,%swnfs06 DISK /disk/data 50 55 DISK * 90 95 MEMSWAP 80 90
HOST=%swnfs07.rose.rdlabs.hpecorp.net<http://swnfs07.rose.rdlabs.hpecorp.net>,%swnfs07 DISK /disk/data 92 96 DISK * 90 95
HOST=%hpnsvr18.rose.rdlabs.hpecorp.net<http://hpnsvr18.rose.rdlabs.hpecorp.net>,%hpnsvr18 DISK /BACKUP 98 99 DISK * 90 95
DEFAULT # Ignore some usually uninteresting tmpfs mounts. DISK /dev IGNORE DISK /dev/shm IGNORE DISK /lib/init/rw IGNORE DISK /run IGNORE # These are the built-in defaults. You should only modify these # lines, not add new ones (no PROC, DISK, LOG ... lines). UP 1h LOAD 5.0 10.0 DISK * 90 95 INODE * 70 90 MEMPHYS 100 101 MEMSWAP 50 80 MEMACT 90 97
Three issues with this:
Swap consumption in the first host, swnfs06, has been steady at 74%, so I was trying to hush the alerts with the MEMSWAP line. This change hasn't had any effect; I am still getting a memory low yellow-warning for swap/page usage on swnfs06.
On the same swnfs06 host, its /disk/data partition is 56% full, so my "DISK..50 60" line was an attempt to trigger a yellow alert. I was testing my understanding of the analysis.cfg file, but the filesystems test remains green.
And my 96% full /BACKUP drive on hpnsvr18 is issuing a red alert for being over the panic level of 95%, where I was trying to set the panic level at 99%.
After wrestling with the man page and many experiments, I'm tossing this to the group for help. Seems very basic, but it's just not working for me. What'm I missing?
I tried switching to the "threshold hostname" format, like this:
egrep -v '^#' analysis.cfg | head -11
DISK /disk/data 50 55 HOST=%swnfs06.rose.rdlabs.hpecorp.net<http://swnfs06.rose.rdlabs.hpecorp.net>,%swnfs06 DISK * 90 95 HOST=%swnfs06.rose.rdlabs.hpecorp.net<http://swnfs06.rose.rdlabs.hpecorp.net>,%swnfs06 MEMSWAP 80 90 HOST=%swnfs06.rose.rdlabs.hpecorp.net<http://swnfs06.rose.rdlabs.hpecorp.net>,%swnfs06
DISK /disk/data 92 96 HOST=%swnfs07.rose.rdlabs.hpecorp.net<http://swnfs07.rose.rdlabs.hpecorp.net>,%swnfs07 DISK * 90 95 HOST=%swnfs07.rose.rdlabs.hpecorp.net<http://swnfs07.rose.rdlabs.hpecorp.net>,%swnfs07
DISK /BACKUP 98 99 HOST=%hpnsvr18.rose.rdlabs.hpecorp.net<http://hpnsvr18.rose.rdlabs.hpecorp.net>,%hpnsvr18 DISK * 90 95 HOST=%hpnsvr18.rose.rdlabs.hpecorp.net<http://hpnsvr18.rose.rdlabs.hpecorp.net>,%hpnsvr18
This produced no change in behavior. I am stopping and starting the Xymon server software and waiting for new html pages to generate after every change in the analysis.cfg.
In my configuration report, I can see that every server configured for local memory tests has acquired the 80%/90% threshold setting, not just swnfs06. And my "DISK /disk/data 50 55" is having no effect at all on any host: The strings "50%" or "60%" appear nowhere in my configuration report.
egrep 'swnfs0[67]' hosts.cfg
16.93.247.204 swnfs06.rose.rdlabs.hpecorp.net<http://swnfs06.rose.rdlabs.hpecorp.net> # rpc=mount,nlockmgr,nfs,ypbind ssh
16.93.247.205 swnfs07.rose.rdlabs.hpecorp.net<http://swnfs07.rose.rdlabs.hpecorp.net> # NOCOLUMNS=files rpc=mount,nlockmgr,nfs,ypbind ssh
16.93.247.204 swnfs06.rose.rdlabs.hpecorp.net<http://swnfs06.rose.rdlabs.hpecorp.net>
xymoncmd xymond_client --dump-config
DISK /disk/data 50% 55% 0 -1 red HOST=%swnfs06.rose.rdlabs.hpecorp.net<http://swnfs06.rose.rdlabs.hpecorp.net>,%swnfs06 (line: 351) DISK * 90% 95% 0 -1 red HOST=%swnfs06.rose.rdlabs.hpecorp.net<http://swnfs06.rose.rdlabs.hpecorp.net>,%swnfs06 (line: 352) MEMSWAP 80 90 HOST=%swnfs06.rose.rdlabs.hpecorp.net<http://swnfs06.rose.rdlabs.hpecorp.net>,%swnfs06 (line: 353) DISK /disk/data 92% 96% 0 -1 red HOST=%swnfs07.rose.rdlabs.hpecorp.net<http://swnfs07.rose.rdlabs.hpecorp.net>,%swnfs07 (line: 355) DISK * 90% 95% 0 -1 red HOST=%swnfs07.rose.rdlabs.hpecorp.net<http://swnfs07.rose.rdlabs.hpecorp.net>,%swnfs07 (line: 356) DISK /BACKUP 98% 99% 0 -1 red HOST=%hpnsvr18.rose.rdlabs.hpecorp.net<http://hpnsvr18.rose.rdlabs.hpecorp.net>,%hpnsvr18 (line: 360) DISK * 90% 95% 0 -1 red HOST=%hpnsvr18.rose.rdlabs.hpecorp.net<http://hpnsvr18.rose.rdlabs.hpecorp.net>,%hpnsvr18 (line: 361) DISK /dev IGNORE (line: 371) DISK /dev/shm IGNORE (line: 372) DISK /lib/init/rw IGNORE (line: 373) DISK /run IGNORE (line: 374) UP 3600 -1 (line: 377) LOAD 5.00 10.00 (line: 378) DISK * 90% 95% 0 -1 red (line: 379) INODE * 70% 90% 0 -1 red (line: 380) MEMREAL 100 101 (line: 381) MEMSWAP 50 80 (line: 382) MEMACT 90 97 (line: 383)
Thanks for any insights you can provide. Feels like I'm making a wrong assumption about how analysis.cfg works.
Best thing I can figure to do would be to switch to local configuration of my Xymon clients, but I'd rather manage custom thresholds centrally.
- Chris
Xymon mailing list Xymon at xymon.com<mailto:Xymon at xymon.com> http://lists.xymon.com/mailman/listinfo/xymon