I had all of this working and "something" changed and now the majority of my groups defined in my analysis.cfg file no longer alert. I'm hoping it wasn't when I upgraded from 4.3.28 to 4.3.30 but I'm not ruling anything out. I have sanitized the server name to foo.bar.com
Analysis.cfg snippet:
HOST=foo* DISK /opt/sas 90 95 GROUP=sas_support DISK /opt/sas/9.4 90 95 GROUP=sas_support DISK /opt/sas/9.4/depot2 90 95 GROUP=sas_support DISK /opt/sas/saslanding 90 95 GROUP=sas_support DISK /opt/sas/saslanding/in 90 95 GROUP=sas_support DISK /opt/sas/saslanding/out 90 95 GROUP=sas_support DISK /opt/sas/sasmain 90 99 GROUP=sas_support DISK /opt/sas/sassecure 90 99 GROUP=sas_support DISK /opt/sas/sassecure/modelingcrm 95 98 GROUP=sas_support DISK /opt/sas/sassecure/servicing 95 99 GROUP=sas_support DISK /opt/sas/sassecure/servicing/SCRA/MOENDs 90 95 GROUP=sas_support DISK /opt/sas/saswork 50 70 GROUP=sas_support
Alerts.cfg snippet:
GROUP=sas_support SERVICE=disk COLOR=red # SAS Application support team SCRIPT /usr/local/xymon-server/server/ext/Create_SN_Ticket_From_Xymon-YP2-RP1.sh sas_support FORMAT=SMS DURATION>30 REPEAT=24h stop
GROUP=sas_support SERVICE=disk COLOR=yellow # SAS Application support team MAIL helpdesk at foo.com FORMAT=SMS DURATION<20 REPEAT=24h stop
Obligatory test from the terminal:
[/usr/local/xymon-server/server/etc] --> ../bin/xymoncmd xymond_alert --test foo.bar.com disk --color=yellow --group=sas_support
00103435 2020-08-14 23:11:06 Matching host:service:dgroup:page 'foo.bar.com:disk:NONE:PROD/PSAS' against rule line 165 00103435 2020-08-14 23:11:06 Failed 'GROUP=sas_support SERVICE=disk COLOR=red' (group not in include list) 00103435 2020-08-14 23:11:06 Matching host:service:dgroup:page 'foo.bar.com:disk:NONE:PROD/PSAS' against rule line 168 00103435 2020-08-14 23:11:06 Failed 'GROUP=sas_support SERVICE=disk COLOR=yellow' (group not in include list)
--> ../bin/xymoncmd xymond_alert --test foo.bar.com disk --group=sas_support 00104898 2020-08-14 23:26:59 Matching host:service:dgroup:page 'foo.bar.com:disk:NONE:PROD/PSAS' against rule line 165 00104898 2020-08-14 23:26:59 Failed 'GROUP=sas_support SERVICE=disk COLOR=red' (group not in include list) 00104898 2020-08-14 23:26:59 Matching host:service:dgroup:page 'foo.bar.com:disk:NONE:PROD/PSAS' against rule line 168 00104898 2020-08-14 23:26:59 Failed 'GROUP=sas_support SERVICE=disk COLOR=yellow' (group not in include list)
However the "red" second test, does match further along in the alerts file, just not with a GROUP definition, and the failure there is expected as I didn't specify the duration. 00104898 2020-08-14 23:26:59 Matching host:service:dgroup:page 'foo.bar.com:disk:NONE:PROD/PSAS' against rule line 304 00104898 2020-08-14 23:26:59 *** Match with 'HOST=%^.* SERVICE=disk COLOR=red' *** 00104898 2020-08-14 23:26:59 Matching host:service:dgroup:page 'foo.bar.com:disk:NONE:PROD/PSAS' against rule line 305 00104898 2020-08-14 23:26:59 Failed 'SCRIPT /usr/local/xymon-server/server/ext/Create_SN_Ticket_From_Xymon-YP2-RP1.sh UNIX FORMAT=SMS DURATION>5 REPEAT=25h' (min. duration 0<301)
Now get this. Here are two more examples from the alerts.cfg file: GROUP=satellite SERVICE=disk #test comment MAIL coworker at foo.com FORMAT=SCRIPT stop
GROUP=unix # Linux Team support (default contact) MAIL unix-alert at lists.foo.com FORMAT=SMS DURATION<20 stop
And the respective tests from the terminal: ../bin/xymoncmd xymond_alert --test foo.bar.com disk --color=yellow --group=satellite 00104439 2020-08-14 23:21:49 Matching host:service:dgroup:page foo.bar.com:disk:NONE:PROD/PSAS' against rule line 147 00104439 2020-08-14 23:21:49 Failed 'GROUP=satellite SERVICE=disk' (group not in include list)
../bin/xymoncmd xymond_alert --test foo.bar.com disk --color=yellow --group=unix 00104535 2020-08-14 23:23:08 Matching host:service:dgroup:page 'foo.bar.com:disk:NONE:PROD/PSAS' against rule line 150 00104535 2020-08-14 23:23:08 *** Match with 'GROUP=unix' *** 00104535 2020-08-14 23:23:08 Matching host:service:dgroup:page 'foo.bar.com:disk:NONE:PROD/PSAS' against rule line 151 00104535 2020-08-14 23:23:08 *** Match with 'MAIL unix-alert at lists.foo.com FORMAT=SMS DURATION<20 stop' *** 00104535 2020-08-14 23:23:08 Mail alert with command 'mail unix-alert at lists.foo.com
I'm stumped. Anyone out there have any idea what might be incorrect?
Tim James Senior System Administrator Navient