Hi Henrik,
Today with the alerttrace still on and, yes, yesterday the script was executed correctly in a tiny test-config. The original config still gives me problems. I checked for control characters in the hobbit-alerts.cfg-file (vi -> set list), and nothing weird found.
Part of the hobbit-alerts.cfg
-some macro's:
Enabled now and then for testing purposes.
###$UNIXTEST=MAIL me at somedomain.nl DURATION>6m TIME=W:0800:1730 REPEAT=1d RECOVERED COLOR=yellow,red,purple
$UNIXDAG=MAIL somewhere at somedomain.nl DURATION>6m TIME=W:0800:1730 REPEAT=1d RECOVERED
$UNIXNACHT=MAIL somewhere at somedomain.nl TIME=*:0000:2359 DURATION>30m REPEAT=1d SERVICE=!cpu,!msgs RECOVERED COLOR=!yellow
$UNIXSEMAFOON_BEHEER=SCRIPT /usr/local/bb/consigne.ksh 00765327285 FORMAT=SMS TIME=*:0000:2359 DURATION>30m REPEAT=60m SERVICE=!cpu,!msgs,!smtp,!bbgen,!bbtest,!hobbitd COLOR=!yellow
-A host not responding for $UNIXSEMAFOON_BEHEER while the yellow mail $UNIXDAG has been sent:
HOST=%(orwell) $UNIXDAG $UNIXTEST $UNIXNACHT $UNIXSEMAFOON_BEHEER
The host does give me an email for a threshold exceeded (disk>95%) and that can be seen in the trace (I only grepped the host specific entries):
00013241 2005-08-18 10:04:45 *** Match with 'HOST=%(orwell)' *** 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 191 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 193 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 194 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 196 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 203 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 209 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 216 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 223 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 229 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 236 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 242 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 254 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 261 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 268 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 275 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 282 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 287 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 294 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 300 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 304 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 311 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 322 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 332 00013241 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 340 00013241 2005-08-18 10:04:45 Failed 'HOST=%(orwell)' (hostname not in include list) 00015024 2005-08-18 10:04:45 send_alert orwell:disk state Paging 00015024 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 184 00015024 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 190 00015024 2005-08-18 10:04:45 *** Match with 'HOST=%(orwell)' *** 00015024 2005-08-18 10:04:45 Matching host:service:page 'orwell:disk:DNO/SBEHEER' against rule line 191 00015024 2005-08-18 10:04:45 Mail alert with command 'mail -s "Hobbit [25437] orwell:disk CRITICAL (RED)" central at somedomain.nl'
But the next (expected) step can not be seen in the trace and it does not occur.
All this could be just a configuration issue, so I restored another tiny config and restarted Hobbit, and that worked fine. So no problems with the mail or script etc :-]
So, now I did the following: -I restored the hobbit-alert.cfg we must use. -I uncommented my $UNIXTEST-macro to prevent empty lines in HOST-sections in the hobbit-alert.cfg knowing that Hobbit can have problems with 2 or more spaces (perhaps newlines too?) -moved the $UNIXTEST-macro to the end of each HOST-section for times I comment out the previous line ;-) -Restarted Hobbit. -Now the first alert is being sent as it should, but the one alert that should page after 30 minutes fails and nothing that triggers something in the logfile.
Regards,
Peter