We had an alert that was yellow for several hours, then turned red. It immediately paged *all* the way up the food chain. The rules seem to be correct, see tests below; "alert1" through "alert4" are SMS aliases.
Does length of time an alert is yellow count towards the duration when it turns red? And if so, can I change this, and/or is there a better way?
In this case, it was a disk filling up... disks can often stay yellow for several hours. Having a disk go from 94% full to 95% is something we want to alert the tech on duty about, but not wake everyone up for.
Thanks much Betsy
MAIL xymail REPEAT=1d RECOVERED # notify techops
MAIL ticket REPEAT=365d COLOR=yellow DURATION>20 # open ticket
MAIL alert1 REPEAT=10 COLOR=red,purple FORMAT=SMS # page onshift
or oncall at start RED, repeat every 10 minutes MAIL alert2 DURATION>20 REPEAT=10 COLOR=red,purple FORMAT=SMS# page secondary after 20 mins RED . repeat every 10 minutes MAIL alert3 DURATION>40 REPEAT=10 COLOR=red,purple FORMAT=SMS# page tertiary after 40 mins RED. repeat every 10mins MAIL alert4 DURATION>60 REPEAT=10 COLOR=red,purple FORMAT=SMS# page mgr after 60 mins RED. repeat every 10mins
-- (domain name removed) [xymon at netmon2 etc]$ ../bin/xymond_alert --test mmf4 disk --duration=5 |grep mail 00022750 2011-04-04 21:51:45 *** Match with 'MAIL xymail REPEAT=1d RECOVERED' *** 00022750 2011-04-04 21:51:45 Mail alert with command '/var/spool/mail/xymon "Xymon [12345] mmf4:disk CRITICAL (RED)" xymail' 00022750 2011-04-04 21:51:45 Mail alert with command 'mail alert1' [xymon at netmon2 etc]$ ../bin/xymond_alert --test mmf4 disk --duration=15 |grep mail 00022752 2011-04-04 21:51:58 *** Match with 'MAIL xymail REPEAT=1d RECOVERED' *** 00022752 2011-04-04 21:51:58 Mail alert with command '/var/spool/mail/xymon "Xymon [12345] mmf4:disk CRITICAL (RED)" xymail' 00022752 2011-04-04 21:51:58 Mail alert with command 'mail alert1' [xymon at netmon2 etc]$ ../bin/xymond_alert --test mmf4 disk --duration=25 |grep mail 00022754 2011-04-04 21:52:06 *** Match with 'MAIL xymail REPEAT=1d RECOVERED' *** 00022754 2011-04-04 21:52:06 Mail alert with command '/var/spool/mail/xymon "Xymon [12345] mmf4:disk CRITICAL (RED)" xymail' 00022754 2011-04-04 21:52:06 Mail alert with command 'mail alert1' 00022754 2011-04-04 21:52:06 Mail alert with command 'mail alert2' [xymon at netmon2 etc]$ ../bin/xymond_alert --test mmf4disk --duration=65 |grep mail 00022767 2011-04-04 21:52:37 *** Match with 'MAIL xymail REPEAT=1d RECOVERED' *** 00022767 2011-04-04 21:52:37 Mail alert with command '/var/spool/mail/xymon "Xymon [12345] mmf4:disk CRITICAL (RED)" xymail' 00022767 2011-04-04 21:52:37 Mail alert with command 'mail alert1' 00022767 2011-04-04 21:52:37 Mail alert with command 'mail alert2' 00022767 2011-04-04 21:52:37 Mail alert with command 'mail alert3' 00022767 2011-04-04 21:52:37 Mail alert with command 'mail alert4' [xymon at netmon2 etc]$