On Wed, Feb 02, 2005 at 03:26:25PM +0000, David Gore wrote:
My 15 minute DURATION fired. I don't think it is a coincidence that it fired at 1 day and 5 hours. I think the earlier possible bug where when you specify 15m you get a particularly large number is probably where the problem is.
I've been testing with DURATION>10 since I last posted to the list, which showed up as 600s and only tested against one time:
"failed minduration 0<600"
I would've expected to see something like this:
Start hobbit, it runs though all the alerts at time=0 when "duration=0"
page.log <snip> "failed minduration 0<600"
5 mins later, when it checks again with "duration=300"
<snip> "failed minduration 300<600"
5 mins later, duration=minduration and it doesn't fail the test, so it's time to send an alert.
Or, quite possibly, I don't know what I am talking about.
Henrik Stoerner wrote:
I tend to agree, but I've been too busy with "real" work these past days, so I haven't had time to investigate it.
If I can be of any assistance in helping debug this by testing patches or alert conditions, just ask.
But
if definitely showed that the repeat thing works ... I got about 900 mails for different services that failed because my external gateway was down.
:) Nothing better than a real world event to stress test monitoring system...