I think I’m missing something. I have
Filesystem 1024-blocks Used Available Capacity Mounted on /dev/hdv1 3768053780 1056651192 2711402588 29% / none 131072 336 130736 1% /tmp
and they’re both graphed. Except the /dev/hdv1 is the entire array and I don’t want to monitor that – just /tmp. (In our last episode, I got it to monitor /tmp by grep –v-ing for tempfs in the client’s DF line).
However… if I add to the host’s config a DISK IGNORE /, then while the text (as above) only shows /tmp, I lose /tmp’s graph line and it only graphs root. I’ve tried including the IGNORE line both before and after a line for /tmp, with no effect. As soon as I add the IGNORE line, /tmp’s graph line vanishes on next update.
Rob Munsch
IT Administrator
PhillyCarShare
215-495-1040 x131
www.phillycarshare.org
Our Vision: A Philadelphia in which non-profit car sharing exceeds the convenience, flexibility, and affordability of car ownership.
I just compiled and installed RC1 on a Red Hat EL5.5 server this AM, and have come across a strange issue – don't know if it's my configuration or something with the new version
It seems like it isn't following the DURATION flag for alerts
I have some rules for a host, it matches on two lines:
./xymond_alert --test hostname.subdomain.domain.com disk 00003158 2011-01-24 13:51:05 send_alert hostname.subdomain.domain.com:disk state Paging 00003158 2011-01-24 13:51:05 Matching host:service:page 'hostname.subdomain.domain.com:disk:unified-tex' against rule line 128 00003158 2011-01-24 13:51:05 *** Match with 'HOST=% SERVICE=%disk|inode|procs|temperature|bbd|http|conn|ssh' *** 00003158 2011-01-24 13:51:05 Matching host:service:page 'hostname.subdomain.domain.com:disk:unified-tex' against rule line 134 00003158 2011-01-24 13:51:05 *** Match with 'HOST=% SERVICE=%disk|inode|procs|temperature' ***
here is the alerts.cfg relevent sections:
128 HOST=% SERVICE=%disk|inode|procs|temperature|bbd|http|conn|ssh 129 MAIL=mailing-list at domain.com DURATION>20 REPEAT=60 COLOR=red RECOVERED
134 HOST=% SERVICE=%disk|inode|procs|temperature 135 MAIL=SMSgateways at txt.att.com TIME=06:0000:2359 DURATION>720 REPEAT=120 COLOR=red RECOVERED 136 MAIL=SMSgateways at txt.att.com TIME=12345:0000:0559 DURATION>720 REPEAT=120 COLOR=red RECOVERED 137 MAIL=SMSgateways at txt.att.com TIME=12345:0600:1759 DURATION>60 REPEAT=120 COLOR=red RECOVERED 138 MAIL=SMSgateways at txt.att.com TIME=12345:1800:2359 DURATION>720 REPEAT=120 COLOR=red RECOVERED
What i am expecting from these rules/matches is
email mailing-list if it's been red for > 20 minutes, and email on recovery
additionally
on Sat, Sunday, if it's been red for > 720 minutes, email my phone on M-F, email my phone if it's been red for > 720 minutes & before 6am after 7am, red > 60 minutes, after 6pm > 7200 minutes again
What is happening is
When a red even occurs, it immediately sends an email to mailing-list at domain.com , and an email to SMSgateways at txt.att.com
Does anyone know if this is my configuration that needs fixing , or something with 4.3.0?
-Sean
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Please do not thread hijack (that is to say, reply to a message, change its contents and subject and send that as a new message). These messages still contain an In-Reply-To: header that irritates people with a threaded mail reader. Please compose new messages instead.
I'd post a link on ettiquette but apparently someone has erased the Wikipedia definition.
On 01/24/2011 02:06 PM, Clark, Sean wrote:
I just compiled and installed RC1 on a Red Hat EL5.5 server this AM, and have come across a strange issue – don't know if it's my configuration or something with the new version
It seems like it isn't following the DURATION flag for alerts
I have some rules for a host, it matches on two lines:
./xymond_alert --test hostname.subdomain.domain.com disk 00003158 2011-01-24 13:51:05 send_alert hostname.subdomain.domain.com:disk state Paging 00003158 2011-01-24 13:51:05 Matching host:service:page 'hostname.subdomain.domain.com:disk:unified-tex' against rule line 128 00003158 2011-01-24 13:51:05 *** Match with 'HOST=% SERVICE=%disk|inode|procs|temperature|bbd|http|conn|ssh' *** 00003158 2011-01-24 13:51:05 Matching host:service:page 'hostname.subdomain.domain.com:disk:unified-tex' against rule line 134 00003158 2011-01-24 13:51:05 *** Match with 'HOST=% SERVICE=%disk|inode|procs|temperature' ***
here is the alerts.cfg relevent sections:
128 HOST=% SERVICE=%disk|inode|procs|temperature|bbd|http|conn|ssh 129 MAIL=mailing-list at domain.com DURATION>20 REPEAT=60 COLOR=red RECOVERED
134 HOST=% SERVICE=%disk|inode|procs|temperature 135 MAIL=SMSgateways at txt.att.com TIME=06:0000:2359 DURATION>720 REPEAT=120 COLOR=red RECOVERED 136 MAIL=SMSgateways at txt.att.com TIME=12345:0000:0559 DURATION>720 REPEAT=120 COLOR=red RECOVERED 137 MAIL=SMSgateways at txt.att.com TIME=12345:0600:1759 DURATION>60 REPEAT=120 COLOR=red RECOVERED 138 MAIL=SMSgateways at txt.att.com TIME=12345:1800:2359 DURATION>720 REPEAT=120 COLOR=red RECOVERED
What i am expecting from these rules/matches is
email mailing-list if it's been red for > 20 minutes, and email on recovery
additionally
on Sat, Sunday, if it's been red for > 720 minutes, email my phone on M-F, email my phone if it's been red for > 720 minutes & before 6am after 7am, red > 60 minutes, after 6pm > 7200 minutes again
What is happening is
When a red even occurs, it immediately sends an email to mailing-list at domain.com , and an email to SMSgateways at txt.att.com
Does anyone know if this is my configuration that needs fixing , or something with 4.3.0?
-Sean
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
To unsubscribe from the xymon list, send an e-mail to xymon-unsubscribe at xymon.com
- ---- _ _ _ _ ___ _ _ _ |Y#| | | |\/| | \ |\ | | |Ryan Novosielski - Sr. Systems Programmer |$&| |__| | | |__/ | \| _| |novosirj at umdnj.edu - 973/972.0922 (2-0922) \__/ Univ. of Med. and Dent.|IST/CST-Academic Svcs. - ADMC 450, Newark -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk0908kACgkQmb+gadEcsb6UvQCeLD6q9TmVwlTyiy7hSvoFR73b G3cAoKVghoHR1qbCVlsUsd5Lc2MtysQV =j//M -----END PGP SIGNATURE-----
On Mon, 24 Jan 2011 14:06:14 -0500, Clark, Sean wrote:
It seems like it isn't following the DURATION flag for alerts
I have some rules for a host, it matches on two lines:
./xymond_alert --test hostname.subdomain.domain.com disk 00003158 2011-01-24 13:51:05 send_alert hostname.subdomain.domain.com:disk state Paging 00003158 2011-01-24 13:51:05 Matching host:service:page 'hostname.subdomain.domain.com:disk:unified-tex' against rule line 128 00003158 2011-01-24 13:51:05 *** Match with 'HOST=%
"HOST=%" really is invalid. You're matching against an empty regular expression, which I guess will match anything. So you could just as well have "HOST=*" or completely drop the HOST criteria.
And if you want to test DURATION rules, you must use the "-- duration=SECONDS" option for xymond_alert - see the man-page.
Finally, I'd suggest that you use the "--cfid" option to get an indication of which line in the alert configuration is triggering the alerts. You can use that on the normal alert task as well, in which case it will be included in the subject line for the alerts.
And you can always look at the "info" status for a host to see how the alert configuration is interpreted. That is often easier to understand than the output from the xymond_alert "test" function.
Regards, Henrik
Clark, Sean <mailto:sean.clark at twcable.com> wrote:
I just compiled and installed RC1 on a Red Hat EL5.5 server this AM, and have come across a strange issue - don't know if it's my configuration or something with the new version
It seems like it isn't following the DURATION flag for alerts
<snip>
here is the alerts.cfg relevent sections:
128 HOST=% SERVICE=%disk|inode|procs|temperature|bbd|http|conn|ssh 129 MAIL=mailing-list at domain.com DURATION>20 REPEAT=60 COLOR=red RECOVERED
134 HOST=% SERVICE=%disk|inode|procs|temperature 135 MAIL=SMSgateways at txt.att.com TIME=06:0000:2359 DURATION>720 REPEAT=120 COLOR=red RECOVERED 136 MAIL=SMSgateways at txt.att.com TIME=12345:0000:0559 DURATION>720 REPEAT=120 COLOR=red RECOVERED 137 MAIL=SMSgateways at txt.att.com TIME=12345:0600:1759 DURATION>60 REPEAT=120 COLOR=red RECOVERED 138 MAIL=SMSgateways at txt.att.com TIME=12345:1800:2359 DURATION>720 REPEAT=120 COLOR=red RECOVERED
What i am expecting from these rules/matches is
email mailing-list if it's been red for > 20 minutes, and email on recovery
<snip>
What is happening is
When a red even occurs, it immediately sends an email to mailing-list at domain.com , and an email to SMSgateways at txt.att.com
Does anyone know if this is my configuration that needs fixing , or something with 4.3.0?
-Sean
I'm going to come at this one from the other direction and ignore the tests you did which, as Henrik pointed out, are invalid. Instead I'm going to add you, do these e-mails follow reds, which follow yellows for a period greater than DURATION? Perhaps, e.g. your disk is permanently yellow, except for when it goes red, and then the alert triggers straight away? Do you have yellow listed as one of your ALERTCOLORS? If so, then I think this may be the 'cause', though I have posted before about how I think this should be made more flexible....
One question I have if Henrik is reading this is, can I use custom colours, e.g. orange, and put that in ALERTCOLORS instead of yellow? I suppose I could just try it and see what happens (since the only yellows that I want to alert on are custom tests anyway)...
Regards,
SebA No virus found in this outgoing message. Checked by AVG - www.avg.com Version: 8.5.449 / Virus Database: 271.1.1/3399 - Release Date: 01/24/11 19:34:00
In <!&!AAAAAAAAAAAuAAAAAAAAAL60wriLM9cRsTVojW0AAAABAFEdQVQs6tMRsLEAoMxarIMAAAABk2cAABAAAACnWr0Vk4vBSZ46JyBPrkK2AQAAAAA=@syntec.co.uk> "SebA" <spa at syntec.co.uk> writes:
One question I have if Henrik is reading this is, can I use custom colours, e.g. orange, and put that in ALERTCOLORS instead of yellow?
No. Xymon only knows the colors that are built into it: green, yellow, red, purple, clear and blue. Anything else is ignored.
Regards, Henrik
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 01/24/2011 11:01 AM, Rob Munsch wrote:
I think I’m missing something. I have
Filesystem 1024-blocks Used Available Capacity Mounted on
/dev/hdv1 3768053780 1056651192 2711402588 29% /
none 131072 336 130736 1% /tmp
and they’re both graphed. Except the /dev/hdv1 is the entire array and I don’t want to monitor that – just /tmp. (In our last episode, I got it to monitor /tmp by grep –v-ing for tempfs in the client’s DF line).
However… if I add to the host’s config a DISK IGNORE /, then while the text (as above) only shows /tmp, I lose /tmp’s graph line and it only graphs root. I’ve tried including the IGNORE line both before and after a line for /tmp, with no effect. As soon as I add the IGNORE line, /tmp’s graph line vanishes on next update.
The only issue I've ever had was ordering. I believe you have to make sure that your defaults are last or something.
- ---- _ _ _ _ ___ _ _ _ |Y#| | | |\/| | \ |\ | | |Ryan Novosielski - Sr. Systems Programmer |$&| |__| | | |__/ | \| _| |novosirj at umdnj.edu - 973/972.0922 (2-0922) \__/ Univ. of Med. and Dent.|IST/CST-Academic Svcs. - ADMC 450, Newark -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk090wIACgkQmb+gadEcsb4rlQCeOTk4gBewm9OfgELKfDQn/RUy UjUAoLyx/hV6rxqhs5mMEzj8Qi3Yhy5i =4i1b -----END PGP SIGNATURE-----
On Mon, 24 Jan 2011 11:01:25 -0500, Rob Munsch wrote:
I think I’m missing something. I have
Filesystem 1024-blocks Used Available Capacity Mounted on /dev/hdv1 3768053780 1056651192 2711402588 29% / none 131072 336 130736 1% /tmp
and they’re both graphed. Except the /dev/hdv1 is the entire array and I don’t want to monitor that – just /tmp. (In our last episode, I got it to monitor /tmp by grep –v-ing for tempfs in the client’s DF line).
However… if I add to the host’s config a DISK IGNORE /, then while the text (as above) only shows /tmp, I lose /tmp’s graph line and it only graphs root. I’ve tried including the IGNORE line both before and after a line for /tmp, with no effect. As soon as I add the IGNORE line, /tmp’s graph line vanishes on next update.
Try this:
HOST=foo DISK %^/$ IGNORE
so you use a regex pattern that only matches "/" to ignore that filesystem.
Regards, Henrik
However… if I add to the host’s config a DISK IGNORE /, then while the text (as above) only shows /tmp, I lose /tmp’s graph line and it only graphs root. I’ve tried including the IGNORE line both before and after a line for /tmp, with no effect. As soon as I add the IGNORE line, /tmp’s graph line vanishes on next update.
Try this:
HOST=foo DISK %^/$ IGNORE
so you use a regex pattern that only matches "/" to ignore that filesystem.
I was expecting this to work - but now I have
Filesystem 1024-blocks Used Available Capacity Mounted on none 131072 15456 115616 12% /tmp
and below it, / is graphed. The graph line for /tmp vanished. I'm very confused.
On Tue, 25 Jan 2011 09:49:03 -0500, Rob Munsch wrote:
However… if I add to the host’s config a DISK IGNORE /, then while the text (as above) only shows /tmp, I lose /tmp’s graph line and it only graphs root. I’ve tried including the IGNORE line both before and after a line for /tmp, with no effect. As soon as I add the IGNORE line, /tmp’s graph line vanishes on next update.
Try this:
HOST=foo DISK %^/$ IGNORE
so you use a regex pattern that only matches "/" to ignore that filesystem.
I was expecting this to work - but now I have
Filesystem 1024-blocks Used Available Capacity Mounted on none 131072 15456 115616 12% /tmp
and below it, / is graphed. The graph line for /tmp vanished. I'm very confused.
Hmm, yes - that can happen because you now have two RRD graph files, but only one graph showing up on the webpage. You should have both of them on the graph in the "trends" column, though.
Either wait 48 hours - then the root-filesystem graph will go "stale" and automatically be ignored on the "disk" status graph display. Or you can go to the ~hobbit/data/rrd/HOSTNAME/ directory and delete/rename the "disk,root.rrd" file out of the way.
Regards, Henrik
-----Original Message----- From: Henrik Størner [mailto:henrik at hswn.dk] Sent: Tuesday, January 25, 2011 10:22 AM To: xymon at xymon.com Subject: Re: [xymon] DISK IGNORE useage
On Tue, 25 Jan 2011 09:49:03 -0500, Rob Munsch wrote:
However… if I add to the host’s config a DISK IGNORE /, then while the text (as above) only shows /tmp, I lose /tmp’s graph line and it only graphs root. I’ve tried including the IGNORE line both before and after a line for /tmp, with no effect. As soon as I add the IGNORE line, /tmp’s graph line vanishes on next update.
Try this:
HOST=foo DISK %^/$ IGNORE
so you use a regex pattern that only matches "/" to ignore that filesystem.
I was expecting this to work - but now I have
Filesystem 1024-blocks Used Available Capacity Mounted on none 131072 15456 115616 12% /tmp
and below it, / is graphed. The graph line for /tmp vanished. I'm very confused.
Hmm, yes - that can happen because you now have two RRD graph files, but only one graph showing up on the webpage. You should have both of them on the graph in the "trends" column, though.
Either wait 48 hours - then the root-filesystem graph will go "stale" and automatically be ignored on the "disk" status graph display. Or you can go to the ~hobbit/data/rrd/HOSTNAME/ directory and delete/rename the "disk,root.rrd" file out of the way.
That did it, Henrik - thanks!
participants (5)
-
henrik@hswn.dk
-
Munsch@phillycarshare.org
-
novosirj@umdnj.edu
-
sean.clark@twcable.com
-
spa@syntec.co.uk