Metrics reports on red/yellow duration? Unacked? Splunk?
My grand-boss is looking to set some standards for how long we let reds and yellows go un-ACKed and un-resolved. There's a built in report but it seems to summarize total time red /yellow and what we're really interested in is how long it's taking us to respond.
Has anyone done anything with this?
I'm wondering if feeding the acklogs into splunk would let us work something up. And/or thinking about just trying to scrape this off the board.
Thoughts and code snippets welcome
I do this in an alert script:
ACTIVE=/home/xymon/server/bin/xymon 0 "xymondlog $BBHOSTSVC"|head -1|awk -F\| '{print"@"$5}'|xargs date -d
NOW=date '+%s'
ALERTACTIVE=/home/xymon/server/bin/xymon 0 "xymondlog $BBHOSTSVC"|head -1|awk -F\| '{print $5}'
ACTIVECOLOR=/home/xymon/server/bin/xymon 0 "xymondlog $BBHOSTSVC"|head -1|awk -F\| '{print $3}'
ALERTDIFF=expr $NOW - $ALERTACTIVE
ALERTTIME=echo - | awk -v S=$ALERTDIFF '{printf "%d hours %d minutes",S/(60*60),S%(60*60)/60}'
Which, eventually shows up like this in our email alert: Alert Active Since: Tue Nov 12 11:28:52 CST 2013 (Duration of Alert 4 hours 1 minutes)
You could use the same logic to get what you want.
Thanks, John Upcoming PTO: None
John Rothlisberger IT Strategy, Infrastructure & Security - Technology Growth Platform TGP for Business Process Outsourcing Accenture 312.693.3136 office
From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of Betsy Schwartz Sent: Wednesday, November 13, 2013 8:20 AM To: xymon at xymon.com Subject: [Xymon] Metrics reports on red/yellow duration? Unacked? Splunk?
My grand-boss is looking to set some standards for how long we let reds and yellows go un-ACKed and un-resolved. There's a built in report but it seems to summarize total time red /yellow and what we're really interested in is how long it's taking us to respond.
Has anyone done anything with this? I'm wondering if feeding the acklogs into splunk would let us work something up. And/or thinking about just trying to scrape this off the board. Thoughts and code snippets welcome
This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited.
Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
www.accenture.com
Belatedly - what I'm thinking about is how to get metrics reports, over the organization, for example "average time to ack yellows" or "time from ack to resolution"
I see that the data about color changes is in $XYMONHOME/data/hist stored by host-test , and the data about acks is in $XYMONHOME/log/acknowledge.log so I'm thinking we can put that together with splunk.
Alternately, the board knows about color and acktime, so it's possible to get realtime stats as below ("this alert has been yellow for N minutes") but there's nothing to put that together over time, which is why I'm thinking splunk
It would be great if xymon's built-in reports knew about "ACK". we've very ack-driven around here
On Wed, Nov 13, 2013 at 9:50 AM, <john.r.rothlisberger at accenture.com> wrote:
I do this in an alert script:
ACTIVE=
/home/xymon/server/bin/xymon 0 "xymondlog $BBHOSTSVC"|head -1|awk -F\| '{print"@"$5}'|xargs date -dNOW=
date '+%s'ALERTACTIVE=
/home/xymon/server/bin/xymon 0 "xymondlog $BBHOSTSVC"|head -1|awk -F\| '{print $5}'ACTIVECOLOR=
/home/xymon/server/bin/xymon 0 "xymondlog $BBHOSTSVC"|head -1|awk -F\| '{print $3}'ALERTDIFF=
expr $NOW - $ALERTACTIVEALERTTIME=
echo - | awk -v S=$ALERTDIFF '{printf "%d hours %d minutes",S/(60*60),S%(60*60)/60}'Which, eventually shows up like this in our email alert:
Alert Active Since: Tue Nov 12 11:28:52 CST 2013 (Duration of Alert 4 hours 1 minutes)
You could use the same logic to get what you want.
Thanks,
John
Upcoming PTO:
None
John Rothlisberger
IT Strategy, Infrastructure & Security - Technology Growth Platform
TGP for Business Process Outsourcing
Accenture
312.693.3136 office
*From:* Xymon [mailto:xymon-bounces at xymon.com] *On Behalf Of *Betsy Schwartz *Sent:* Wednesday, November 13, 2013 8:20 AM *To:* xymon at xymon.com *Subject:* [Xymon] Metrics reports on red/yellow duration? Unacked? Splunk?
My grand-boss is looking to set some standards for how long we let reds and yellows go un-ACKed
and un-resolved. There's a built in report but it seems to summarize total time red /yellow and what we're really interested in is how long it's taking us to respond.
Has anyone done anything with this?
I'm wondering if feeding the acklogs into splunk would let us work something up. And/or thinking about just trying to scrape this off the board.
Thoughts and code snippets welcome
This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited.
Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
www.accenture.com
Interesting... I just finished working on a perl script that notifies us when someone acks an alert. This may not be exactly what you are looking for but you can use it or change it to your liking.
The script is attached. No, it's not perfect and there are probably lots of things that could be done differently... but it works. You will need to change the following lines to suite your needs: From => '<sender>@<company>.com', To => '<receipient>@<company>.com',
I also just
I have named it ack_watch.pl and run it via cron every 5 minutes. */5 * * * * /home/xymon/bin/ack_watch.pl > /dev/null 2>&1
It looks at the epoch time and duration in the acknowledge.log file and checks to see if the ack end time is greater than the current time. If it is, it will generate an email that looks like this:
Report Time: 11/26/2013 08:30
Xymon Server:
The following alert(s) were recently acknowledged.
Server/Test: attbbydb1.msgs
Ack at: 11/26/2013 08:23
Ack ends: 11/29/2013 23:23
Ack duration: 3 days 15 hours
Alert color: yellow
Ack reason: ACK TEST ONLY 87 hours
It will also create a temporary file using the acktime + alert id which is just used to not send duplicate emails for the same ack. (create a directory called: ~server/tmp/ACK_WATCH)
To keep the script from parsing through a long history of acks I have set it up so that after 10 acks are in acknowledge.log the file is moved to an archive directory.
I don't know if this is the direction you were looking to go but it seemed appropriate.
Thanks, John Upcoming PTO: None
John Rothlisberger IT Strategy, Infrastructure & Security - Technology Growth Platform TGP for Business Process Outsourcing Accenture 312.693.3136 office
From: Betsy Schwartz [mailto:betsy.schwartz at gmail.com] Sent: Tuesday, November 26, 2013 9:42 AM To: Rothlisberger, John R. Cc: xymon at xymon.com Subject: Re: [Xymon] Metrics reports on red/yellow duration? Unacked? Splunk?
Belatedly - what I'm thinking about is how to get metrics reports, over the organization, for example "average time to ack yellows" or "time from ack to resolution"
I see that the data about color changes is in $XYMONHOME/data/hist stored by host-test , and the data about acks is in $XYMONHOME/log/acknowledge.log so I'm thinking we can put that together with splunk.
Alternately, the board knows about color and acktime, so it's possible to get realtime stats as below ("this alert has been yellow for N minutes") but there's nothing to put that together over time, which is why I'm thinking splunk It would be great if xymon's built-in reports knew about "ACK". we've very ack-driven around here
On Wed, Nov 13, 2013 at 9:50 AM, <john.r.rothlisberger at accenture.com<mailto:john.r.rothlisberger at accenture.com>> wrote: I do this in an alert script:
ACTIVE=/home/xymon/server/bin/xymon 0 "xymondlog $BBHOSTSVC"|head -1|awk -F\| '{print"@"$5}'|xargs date -d
NOW=date '+%s'
ALERTACTIVE=/home/xymon/server/bin/xymon 0 "xymondlog $BBHOSTSVC"|head -1|awk -F\| '{print $5}'
ACTIVECOLOR=/home/xymon/server/bin/xymon 0 "xymondlog $BBHOSTSVC"|head -1|awk -F\| '{print $3}'
ALERTDIFF=expr $NOW - $ALERTACTIVE
ALERTTIME=echo - | awk -v S=$ALERTDIFF '{printf "%d hours %d minutes",S/(60*60),S%(60*60)/60}'
Which, eventually shows up like this in our email alert: Alert Active Since: Tue Nov 12 11:28:52 CST 2013 (Duration of Alert 4 hours 1 minutes)
You could use the same logic to get what you want.
Thanks, John Upcoming PTO: None
John Rothlisberger IT Strategy, Infrastructure & Security - Technology Growth Platform TGP for Business Process Outsourcing Accenture 312.693.3136<tel:312.693.3136> office
From: Xymon [mailto:xymon-bounces at xymon.com<mailto:xymon-bounces at xymon.com>] On Behalf Of Betsy Schwartz Sent: Wednesday, November 13, 2013 8:20 AM To: xymon at xymon.com<mailto:xymon at xymon.com> Subject: [Xymon] Metrics reports on red/yellow duration? Unacked? Splunk?
My grand-boss is looking to set some standards for how long we let reds and yellows go un-ACKed and un-resolved. There's a built in report but it seems to summarize total time red /yellow and what we're really interested in is how long it's taking us to respond.
Has anyone done anything with this? I'm wondering if feeding the acklogs into splunk would let us work something up. And/or thinking about just trying to scrape this off the board. Thoughts and code snippets welcome
This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited.
Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
www.accenture.com<http://www.accenture.com>
participants (2)
-
betsy.schwartz@gmail.com
-
john.r.rothlisberger@accenture.com