duration of MSG red status

nskyrca＠syr.edu

24 Oct 2014 24 Oct '14

3:13 p.m.

Hello, I think this has been asked before, but it was a long time ago and I wondered if something changed since then.

How long will the status of MSGS stay red? We just setup monitoring recently and it seems like it stays red for about 30 minutes. Is that normal? Is this configurable?

We are running Xymon server 4.2.3.

Thanks, Nicole

Show replies by date

cleaver＠terabithia.org

24 Oct 24 Oct

6:50 p.m.

On Fri, October 24, 2014 8:13 am, Nicole Beck wrote:

...

Hello, I think this has been asked before, but it was a long time ago and I wondered if something changed since then.

How long will the status of MSGS stay red? We just setup monitoring recently and it seems like it stays red for about 30 minutes. Is that normal? Is this configurable?

We are running Xymon server 4.2.3.

Nicole,

The duration of the 'msgs' test is actually a function of how many cycles back logfetch will scan for content to include in the log data going forward (actual calculation of the color is via the regex's performed by xymond_client).

logfetch will look back 6 runtime-positions which, combined with the default xymonclient run interval of 5m, ends up causing the 30m figure.

The former value is compiled in, however the run frequency is configurable. (We run our clients on 100s cycles, which means our msgs tests last for 10-12m.)

I'm not sure how easy the 6x positions would be to be made dynamic or a runtime option, but that would be nice.

Regards,

-jc

novosirj＠ca.rutgers.edu

9:41 p.m.

...

On Oct 24, 2014, at 14:50, J.C. Cleaver <cleaver at terabithia.org> wrote:

...
On Fri, October 24, 2014 8:13 am, Nicole Beck wrote: Hello, I think this has been asked before, but it was a long time ago and I wondered if something changed since then.

How long will the status of MSGS stay red? We just setup monitoring recently and it seems like it stays red for about 30 minutes. Is that normal? Is this configurable?

We are running Xymon server 4.2.3.

Nicole,

The duration of the 'msgs' test is actually a function of how many cycles back logfetch will scan for content to include in the log data going forward (actual calculation of the color is via the regex's performed by xymond_client).

logfetch will look back 6 runtime-positions which, combined with the default xymonclient run interval of 5m, ends up causing the 30m figure.

The former value is compiled in, however the run frequency is configurable. (We run our clients on 100s cycles, which means our msgs tests last for 10-12m.)

I'm not sure how easy the 6x positions would be to be made dynamic or a runtime option, but that would be nice.

Could have sworn the number of lines to look at was configurable too. Maybe I'm thinking of BB?

waa-hobbitml＠revpol.com

26 Oct 26 Oct

2:26 p.m.

On 10/24/2014 05:41 PM, Novosielski, Ryan wrote:

...

...
On Oct 24, 2014, at 14:50, J.C. Cleaver <cleaver at terabithia.org> wrote:

...
On Fri, October 24, 2014 8:13 am, Nicole Beck wrote: Hello, I think this has been asked before, but it was a long time ago and I wondered if something changed since then.

How long will the status of MSGS stay red? We just setup monitoring recently and it seems like it stays red for about 30 minutes. Is that normal? Is this configurable?

We are running Xymon server 4.2.3.

Nicole,

The duration of the 'msgs' test is actually a function of how many cycles back logfetch will scan for content to include in the log data going forward (actual calculation of the color is via the regex's performed by xymond_client).

logfetch will look back 6 runtime-positions which, combined with the default xymonclient run interval of 5m, ends up causing the 30m figure.

The former value is compiled in, however the run frequency is configurable. (We run our clients on 100s cycles, which means our msgs tests last for 10-12m.)

I'm not sure how easy the 6x positions would be to be made dynamic or a runtime option, but that would be nice.

Could have sworn the number of lines to look at was configurable too. Maybe I'm thinking of BB?

Hi Ryan, I was thinking the same thing, but I think we may be thinking of the max bytes to send. from client-local.cfg docs:

log:/var/log/messages:10240 - The log:FILENAME:SIZE line defines the filename of the log, and the maximum amount of data (in bytes) to send to the Xymon server.

This thread caused me to start thinking about a similar problem I have not had time to look into for a long time, and I think Xymon has an option that might fix both of our problems.

My situation: I have a custom script on a server that checks licenses for Zimbra email archiving accounts. If all the available "archiving account" licenses have been used, and an archiving account is attempted to be created, the script will log:

"error: ArchivingAccountsLimit exceeded: 163/125"

When I set script and Xymon logfile test this up, I tested it and Xymon properly reported yellow and I thought I was set. I didn't realize that it was only staying yellow for 30 minutes. So once my testing was done, I set the script to run at 2:00am daily and thought I was done.

Unfortunately, this just means that every morning at 2am this test goes yellow for 30 minutes and is green by the time the IT people come in. (They do not get/want alerts for anything other than some temperatures currently)

So... while re-investigating this, I see that the client-local.cfg has an optional trigger:PATTERN option for logfiles which states:

"The trigger PATTERN line (optional) is used only when there is more data in the log than the maximum size set in the "log:FILENAME:SIZE" line. The "trigger" pattern is then used to find particularly interesting lines in the logfile - these will always be sent to the Xymon server. After picking out the "trigger" lines, any remaining space up to the maximum size is filled in with the most recent entries from the logfile. "PATTERN" is a regular expression."

I have not tested this, but it would seem to indicate that it would cause the client to send the Xymon server all the lines that match the trigger pattern (regardless of how far back in time they go in the logfile) which should cause the test to stay non-green until the logfile is rotated and no more lines with the trigger pattern exist.

Can anyone confirm or deny this functionality?

Bill

-- Bill Arlofski Reverse Polarity, LLC http://www.revpol.com/ -- Not responsible for any advertising below this line --

jlaidman＠rebel-it.com.au

27 Oct 27 Oct

8:45 p.m.

On 27 October 2014 01:26, Bill Arlofski <waa-hobbitml at revpol.com> wrote:

...

I have not tested this, but it would seem to indicate that it would cause the client to send the Xymon server all the lines that match the trigger pattern (regardless of how far back in time they go in the logfile) which should cause the test to stay non-green until the logfile is rotated and no more lines with the trigger pattern exist.

I haven't verified this, but my understanding of how the "logfetch" process works is that it keeps state of where it got up to in each logfile, and for the next (5 minute) round, it starts looking for matches only from that point onwards. This means, if there's a trigger match in the log file, the client will send it to the server in that round only.

waa-hobbitml＠revpol.com

10:58 p.m.

On 10/27/2014 04:45 PM, Jeremy Laidman wrote:

...

On 27 October 2014 01:26, Bill Arlofski <waa-hobbitml at revpol.com> wrote:

...
I have not tested this, but it would seem to indicate that it would cause the client to send the Xymon server all the lines that match the trigger pattern (regardless of how far back in time they go in the logfile) which should cause the test to stay non-green until the logfile is rotated and no more lines with the trigger pattern exist.

I haven't verified this, but my understanding of how the "logfetch" process works is that it keeps state of where it got up to in each logfile, and for the next (5 minute) round, it starts looking for matches only from that point onwards. This means, if there's a trigger match in the log file, the client will send it to the server in that round only.

J

Yes, my testing over the weekend seemed to indicate that as well. JC Cleaver described the process pretty clearly too.

My problem is that the log file in my example gets appended once/night, and there are plenty of lines with the "trigger" I am needing to alert on - in other words, the log is pretty static, and when the problem exists, it will exists until the next run 24 hours later and I would want to keep that Xymon msgs test yellow until it actually cleared up, not based on an arbitrary 6 x 5 minute client reports.

Since the msgs test works as you and JC have described, I guess my only option would be to write a short client-side "ZimbraLicense" test which would check the log for the trigger text, and set test color accordingly.

Other ideas? Can I somehow hammer this square peg into a round hole?

Thanks!

Bill

-- Bill Arlofski Reverse Polarity, LLC http://www.revpol.com/ -- Not responsible for anything below this line --

jlaidman＠rebel-it.com.au

28 Oct 28 Oct

12:05 a.m.

On 28 October 2014 09:58, Bill Arlofski <waa-hobbitml at revpol.com> wrote:

...

Other ideas? Can I somehow hammer this square peg into a round hole?

You can create a dynamic file based on the logfile, and alert on that. For example, in client-local.cfg, something like this:

log:LOG=/tmp/zlic.status; M=$(date +%M); [ $(expr $M % 10) -ge 5 ] && rm -f $LOG; grep "ArchivingAccountsLimit exceeded" /var/log/messages >> $LOG; [ -s $LOG ] && echo "$LOG":4096

I'm assuming that /var/log/messages is rotated daily. What happens here is that zlic.status will get the log entries from your current messages file (updated every 5 minutes) appended to it. If there are no log entries, then the filename is not echoed and Xymon will ignore it (and no alerts possible).

The trick here is that the zlic.status file is emptied only every second run (every 10 minutes) prior to appending the log entries. By shrinking the file size, logfetch thinks the file has been rotated, zeroes its status, and starts looking at the file from the beginning.

Note that if you get a log entry in your messages file just prior to rotation, then you'll only get an alert between the time the message is detected and the messages file is rotated, which could be only a few minutes, or even not at all if the timing isn't favourable. So in other words, this will generate an alert that persists until the next rotation of messages, or messages in the last 0-24 hours. If you want to go for longer than that, you could perhaps grep from the current and previous messages file, so you're alerting on any messages in the last 24-48 hours.

Another way to do this is to use a "file:" definition, similarly creating a status file and then alarming on the file's size (non-zero indicating an alertable log entry). For example:

file:LOG=/tmp/zlic.status; grep "ArchivingAccountsLimit exceeded" /var/log/messages >> $LOG; echo $LOG

Then in analysis.cfg, create a matching entry and alert on size>0. A down-side to this approach is that you get a particularly unhelpful message along the lines of "FILE /tmp/zlic.status red size >0".

A third and similar way to do this is to create a file that exists only if the licencing log is not detected. Like so:

file:LOG=/tmp/zlic.OK; grep "ArchivingAccountsLimit exceeded" >/dev/null && rm -f $LOG || touch $LOG; echo $LOG

Then in analysis.cfg, create a matching entry and alert on "noexist".

Yet another way to do this is to use a pseudo-file to generate a status message. For example:

file:COL=green; MSG="licencing OK"; LOGS=$(grep "ArchivingAccountsLimit exceeded" /var/log/messages); [ "$LOGS" ] && { COL=red; MSG="licencing error"; }; echo "status ${MACHINE}.zlic $COL $(date) $MSG" | $XYMON $XYMSRV @

There is no output from this pseudo-file, so Xymon will not take any "file" connotations from it and will simply ignore it, except for the side-effects from the $XYMON command that's also run here. This is tantamount to having a client-side ext script, and you may simply prefer to do that. But this can be deployed centrally.

A few notes:

None of these specific examples have been tested, and may contain syntax errors, but scriptlets like these have been used on production systems.
I deliberately avoided using colons and backticks, because they are interpreted by the logfetch binary, and break the scriptlets.
These scriptlets take up to 15 minutes to start reporting after being added to client-local.cfg. When I'm testing these sort of things, I like to bring up a xymoncmd shell, and paste in the bits between the backticks, and look for errors or unexpected output.

nskyrca＠syr.edu

6:16 p.m.

What I’m seeing is that I get an alert for my trigger string (which has a timestamp on it), and then I keep getting alerts for the same trigger string (with the same timestamp) for the next 30 minutes. I’m not sure if anything else was append to the log file in that 30 minutes. I stop getting the alerts after 30 minutes and don’t have to wait until the log is rotated for the alert to clear.

Nicole

From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of Jeremy Laidman Sent: Monday, October 27, 2014 4:45 PM To: Bill Arlofski Cc: xymon at xymon.com Subject: Re: [Xymon] duration of MSG red status

On 27 October 2014 01:26, Bill Arlofski <waa-hobbitml at revpol.com<mailto:waa-hobbitml at revpol.com>> wrote: I have not tested this, but it would seem to indicate that it would cause the client to send the Xymon server all the lines that match the trigger pattern (regardless of how far back in time they go in the logfile) which should cause the test to stay non-green until the logfile is rotated and no more lines with the trigger pattern exist.

jlaidman＠rebel-it.com.au

29 Oct 29 Oct

1:37 a.m.

Nicole

On 29 October 2014 05:16, Nicole Beck <nskyrca at syr.edu> wrote:

...

What I’m seeing is that I get an alert for my trigger string (which has a timestamp on it), and then I keep getting alerts for the same trigger string (with the same timestamp) for the next 30 minutes.

How often do you get the repeated alerts? Or how many in that 30 minutes?

...

I’m not sure if anything else was append to the log file in that 30 minutes. I stop getting the alerts after 30 minutes and don’t have to wait until the log is rotated for the alert to clear.

Do you have ALERTREPEAT defined in xymonserver.cfg? The default is 30 seconds, but you may have it less than that.

Similarly, do you have "REPEAT" defined in alerts.cfg for the rule matching these alerts? (The "REPEAT" value in alerts.cfg defaults to the setting of ALERTREPEAT.)

Is your message status (red?) staying non-green for the 30 minutes, or non-green for only a short time, or flapping like red/green/red/green?

The way messages get to Xymon are via the client data. So during an "event" you can click on the "Client data available" link at the bottom of your "msgs" page for the host, and it should show you all of the client data, and you can search for the logfilename to see what log lines the client sent to the server. Or you can click on the logfile name on the "msgs" page for a modified client data report showing just the log lines for that logfile.

What I'm trying to understand is whether you are getting the same messages sent multiple times from the client causing multiple events, or whether the one event is generating multiple alerts.

...

From what I can tell, a red "msgs" status will stay red for only one 5-minute client cycle. The next time the client sends its client data report, if the logfile in question has no new matching lines, it will actively generate a green status.

4256

Age (days ago)

4261

Last active (days ago)

List overview

Download

8 comments

5 participants

participants (5)

cleaver＠terabithia.org
jlaidman＠rebel-it.com.au
novosirj＠ca.rutgers.edu
nskyrca＠syr.edu
waa-hobbitml＠revpol.com