Xymon + graphite

older
Segfault in confreport-critical.sh...

Galen.Johnson＠sas.com

7 Dec 2015 7 Dec '15

6:48 p.m.

Hey,

Has anyone tried to integrate alerting based on Graphite? Or used Graphite as a trending replacement to rrd? I love Xymon for my monitoring but the limitations and aggregations of rrds are starting to become an issue.

thanks

=G=

NB: I was sent this link today, http://blog.takipi.com/graphite-vs-grafana-build-the-best-monitoring-archite..., and on the Graphite page there is this link under monitoring that I thought I might consider modifying for Xymon: https://github.com/blacked/graphite-to-zabbix.

=G=

Show replies by date

jlaidman＠rebel-it.com.au

10 Dec 10 Dec

10:49 a.m.

On Tue, Dec 8, 2015 at 5:49 AM Galen Johnson <Galen.Johnson at sas.com> wrote:

...

Has anyone tried to integrate alerting based on Graphite? Or used Graphite as a trending replacement to rrd? I love Xymon for my monitoring but the limitations and aggregations of rrds are starting to become an issue.

Nope, but I'm intrigued by Graphite. Most of my servers have enormously long trends pages because of all the extra graphs I've added. These are indispensable for tracking down weird faults. But the number of graphs and RRD files has become unwieldy. One major shortcoming is that I can't put metrics from different hosts onto the same graph. I've used RRGrapher < http://pages.cs.wisc.edu/~plonka/RRGrapher/> to let me create ad-hoc graphs like this, but it's obviously from last millennium, and could do with a facelift.

I'd also like to integrate Smokeping into my monitoring service. Having multiple interfaces into the suite of RRD files makes for a less-than-intuitive user experience.

For trending, Xymon can threshold (alert) on RRD files with the "DS" operator in analysis.cfg. Perhaps this can be extended to alert on Holt-Winters aberrant behaviour thresholds. Doing the same sort of thing with a rewrite of the g2zproxy probably wouldn't be too difficult, at least not on the Xymon side.

cleaver＠terabithia.org

4:02 p.m.

On Thu, December 10, 2015 2:49 am, Jeremy Laidman wrote:

...

On Tue, Dec 8, 2015 at 5:49 AM Galen Johnson <Galen.Johnson at sas.com> wrote:

...
Has anyone tried to integrate alerting based on Graphite? Or used Graphite as a trending replacement to rrd? I love Xymon for my monitoring but the limitations and aggregations of rrds are starting to become an issue.

Nope, but I'm intrigued by Graphite. Most of my servers have enormously long trends pages because of all the extra graphs I've added. These are indispensable for tracking down weird faults. But the number of graphs and RRD files has become unwieldy. One major shortcoming is that I can't put metrics from different hosts onto the same graph. I've used RRGrapher < http://pages.cs.wisc.edu/~plonka/RRGrapher/> to let me create ad-hoc graphs like this, but it's obviously from last millennium, and could do with a facelift.

I'd been looking at http://www.flotcharts.org/ and a few other RRD graphing packages that could be used providing a more browseable interface. There's absolutely a need (aside from the CSS work and a potential "dashboard" view generally) for improved multi-host and multi-graph views besides the linear trends output, I agree.

...

For trending, Xymon can threshold (alert) on RRD files with the "DS" operator in analysis.cfg. Perhaps this can be extended to alert on Holt-Winters aberrant behaviour thresholds. Doing the same sort of thing with a rewrite of the g2zproxy probably wouldn't be too difficult, at least not on the Xymon side.

(Actually, the RRD files generated on new RPM installs have had HWPREDICT, SEASONAL, and a few other RRA's configured for a while now, if anyone feels like experimenting...)

One problem with the current RRD paradigm is that alerting is happening only with data available at insertion time, not using data that's stored into RRD file (or whatever metric store you have) already, so xymond_rrd can't efficiently alert on things beyond that.

A "xymond_trend" could operate asynchronously on the RRD files, but to get useful trend data back out of RRDs you'll need to flush the data to disk first, which more or less blows out your I/O performance. Fine if you're on SSD, but more of a problem if you're on heavily loaded spinning disks.

The problem there is just that there're just so many different ways of doing this with a lot of different needs. To make something flexible enough would require a good survey of what people are looking for.

(With that in mind -- What are people looking for? :) Maybe it's easier than I'm thinking.)

Alternatively, sending the metric data off entirely to a different package, which can reinject an alert into xymon if/when it notices a trend, is an easily-approachable option using the RRD --processor option, which can fork your metric feed off to whatever you like (OpenTSDB, graphite, splunk, etc...). The re-posting of alerts back into xymon can be done with that package's notification tool set and some scripting of xymon messages.

Regards, -jc

bferrell＠baywinds.org

5:05 p.m.

On 12/10/2015 08:02 AM, J.C. Cleaver wrote:

...

On Thu, December 10, 2015 2:49 am, Jeremy Laidman wrote:

...
On Tue, Dec 8, 2015 at 5:49 AM Galen Johnson <Galen.Johnson at sas.com> wrote:

...
Has anyone tried to integrate alerting based on Graphite? Or used Graphite as a trending replacement to rrd? I love Xymon for my monitoring but the limitations and aggregations of rrds are starting to become an issue.

Nope, but I'm intrigued by Graphite. Most of my servers have enormously long trends pages because of all the extra graphs I've added. These are indispensable for tracking down weird faults. But the number of graphs and RRD files has become unwieldy. One major shortcoming is that I can't put metrics from different hosts onto the same graph. I've used RRGrapher < http://pages.cs.wisc.edu/~plonka/RRGrapher/> to let me create ad-hoc graphs like this, but it's obviously from last millennium, and could do with a facelift. I'd been looking at http://www.flotcharts.org/ and a few other RRD graphing packages that could be used providing a more browseable interface. There's absolutely a need (aside from the CSS work and a potential "dashboard" view generally) for improved multi-host and multi-graph views besides the linear trends output, I agree.

...
For trending, Xymon can threshold (alert) on RRD files with the "DS" operator in analysis.cfg. Perhaps this can be extended to alert on Holt-Winters aberrant behaviour thresholds. Doing the same sort of thing with a rewrite of the g2zproxy probably wouldn't be too difficult, at least not on the Xymon side.

(Actually, the RRD files generated on new RPM installs have had HWPREDICT, SEASONAL, and a few other RRA's configured for a while now, if anyone feels like experimenting...)

One problem with the current RRD paradigm is that alerting is happening only with data available at insertion time, not using data that's stored into RRD file (or whatever metric store you have) already, so xymond_rrd can't efficiently alert on things beyond that.

A "xymond_trend" could operate asynchronously on the RRD files, but to get useful trend data back out of RRDs you'll need to flush the data to disk first, which more or less blows out your I/O performance. Fine if you're on SSD, but more of a problem if you're on heavily loaded spinning disks.

The problem there is just that there're just so many different ways of doing this with a lot of different needs. To make something flexible enough would require a good survey of what people are looking for.

(With that in mind -- What are people looking for? :) Maybe it's easier than I'm thinking.)

Alternatively, sending the metric data off entirely to a different package, which can reinject an alert into xymon if/when it notices a trend, is an easily-approachable option using the RRD --processor option, which can fork your metric feed off to whatever you like (OpenTSDB, graphite, splunk, etc...). The re-posting of alerts back into xymon can be done with that package's notification tool set and some scripting of xymon messages.

Regards, -jc

Having done a bit of this type of thing in another life, what you're discussing is what we termed an alert manager/data collector architecture. The entire beauty of rrd data storage is it's simplicity and It automatically does rollups.

I started my charting using flot and because of the complexity of managing js charting on all the different browsers, I eventually scrapped js charting entirely and used GD to generate chart images. For the particular use case, RRD didn't make sense as exact storage historical data was mandatory... rollups/data averaging was not allowed.

john.thurston＠alaska.gov

5:24 p.m.

On 12/10/2015 7:02 AM, J.C. Cleaver wrote:

...

Alternatively, sending the metric data off entirely to a different package, which can reinject an alert into xymon if/when it notices a trend, is an easily-approachable option using the RRD --processor option, which can fork your metric feed off to whatever you like (OpenTSDB, graphite, splunk, etc...). The re-posting of alerts back into xymon can be done with that package's notification tool set and some scripting of xymon messages.

This seems like the "Xymonish" way to do this.

Attempting to embed cross-host, historic-metric, and trending analysis seems to stray pretty far from the Big Brother/Xymon tradition. "Give me a red/yellow/green message and I'll put it a web page and send an email to whomever you have specified." (See my sig-line)

-- Do things because you should, not just because you can.

John Thurston 907-465-8591 John.Thurston at alaska.gov Enterprise Technology Services Department of Administration State of Alaska

3849

Age (days ago)

3852

Last active (days ago)

List overview

Download

4 comments

5 participants

participants (5)

bferrell＠baywinds.org
cleaver＠terabithia.org
Galen.Johnson＠sas.com
jlaidman＠rebel-it.com.au
john.thurston＠alaska.gov