[hobbit] conn alerts based on ping time
Sounds like they need to through in MRTG and go red when the traffic is high on the link.
And then throw in things like
bb-ospf.pl to check that ospf is not flapping over the link
bb-xsnmp.pl to check out the routers at each end and the interfaces
you can also use http to a reliable server on the remote side as part of the link test. Just make the http test for the link dependent on the router and the conn test to the web server.
From: Charles Jones [mailto:jonescr at cisco.com] Sent: Friday, January 13, 2006 1:01 PM To: hobbit at hswn.dk Cc: crimson at technologist.com Subject: [hobbit] conn alerts based on ping time
I'm helping someone set up Hobbit at their company, and they want to monitor the status of a remote office T1 link. Of course Hobbit can tell them if the link goes totally down, or you can ignore bad pings with "badconn", but they want to know when the link is slow, as they often have periods of time when the pings are not dropped, but instead taking 1-3 seconds (instead of <100ms like normal).
Is there any chance that Hobbit will soon support comparing the ping replies to specifiied values for green, yellow, and red?
Somethign like:
1.2.3.4 myhost.com # conn:200:500
This would make myhost.com's conn test go yellow if the ping was between 200 and 500ms, and red if it was over 500ms. Since hobbit already graphs the numeric values of the ping replies, this seems like it would be fairly easy to add?
-Charles
Deal, Richard wrote:
Sounds like they need to through in MRTG and go red when the traffic is high on the link.
And then throw in things like
bb-ospf.pl to check that ospf is not flapping over the link
bb-xsnmp.pl to check out the routers at each end and the interfaces
Yeah I'm aware of the existance of bb-mrtg.pl, although I have never set it up. I guess I was hoping that Hobbit could natively support ping testing rather than having to install mrtg and hack stuff in. Its sort of confusing for a newbie when you are showing them the ropes of Hobbit and start bringing external scripts into the mix (especially ones that require modifying before they will work).
you can also use http to a reliable server on the remote side as part of the link test. Just make the http test for the link dependent on the router and the conn test to the web server.
That won't work in this case as all of the companies servers are in a CoLo, Hobbit is running at the CoLo, and they want to test the T1 link at the office from the CoLo (there are no servers on the other side of the office T1 to do a test against), and even if there was, it still would not give them a heads-up to the T1 being slow/saturated, as Hobbit only alerts when the conn test outright fails.
-Charles
*From:* Charles Jones [mailto:jonescr at cisco.com] *Sent:* Friday, January 13, 2006 1:01 PM *To:* hobbit at hswn.dk *Cc:* crimson at technologist.com *Subject:* [hobbit] conn alerts based on ping time
I'm helping someone set up Hobbit at their company, and they want to monitor the status of a remote office T1 link. Of course Hobbit can tell them if the link goes totally down, or you can ignore bad pings with "badconn", but they want to know when the link is *slow*, as they often have periods of time when the pings are not dropped, but instead taking 1-3 seconds (instead of <100ms like normal).
Is there any chance that Hobbit will soon support comparing the ping replies to specifiied values for green, yellow, and red?
Somethign like:
1.2.3.4 myhost.com # conn:200:500
This would make myhost.com's conn test go yellow if the ping was between 200 and 500ms, and red if it was over 500ms. Since hobbit already graphs the numeric values of the ping replies, this seems like it would be fairly easy to add?
-Charles
Really, honestly, im not trying to belabor a point here, but you need to be careful as the ping only runs every 5 minutes, so even if you could get this alerting to work, the link would have to be slow during a ping cycle. So it could possible be slow for 4 minutes, recover, and the page wouldn't happen, as the ping time would be ok. Assuming the client saw the slowness during those 4 minutes via other methods, they would then question why hobbit didn't see it.
Same thing hapens to me with spikes in network traffic between polling periods, I don't see them.
With MRTG, you can shorten the time to 1 minute. MRTG integration with hobbit isn't too hard, so thats probably the route you should go.
-Jeff
On 1/13/06, Charles Jones <jonescr at cisco.com> wrote:
Deal, Richard wrote:
Sounds like they need to through in MRTG and go red when the traffic is high on the link.
And then throw in things like
bb-ospf.pl to check that ospf is not flapping over the link
bb-xsnmp.pl to check out the routers at each end and the interfaces
Yeah I'm aware of the existance of bb-mrtg.pl, although I have never set it up. I guess I was hoping that Hobbit could natively support ping testing rather than having to install mrtg and hack stuff in. Its sort of confusing for a newbie when you are showing them the ropes of Hobbit and start bringing external scripts into the mix (especially ones that require modifying before they will work).
you can also use http to a reliable server on the remote side as part of the link test. Just make the http test for the link dependent on the router and the conn test to the web server.
That won't work in this case as all of the companies servers are in a CoLo, Hobbit is running at the CoLo, and they want to test the T1 link at the office from the CoLo (there are no servers on the other side of the office T1 to do a test against), and even if there was, it still would not give them a heads-up to the T1 being slow/saturated, as Hobbit only alerts when the conn test outright fails.
-Charles
*From:* Charles Jones [mailto:jonescr at cisco.com <jonescr at cisco.com>] *Sent:* Friday, January 13, 2006 1:01 PM *To:* hobbit at hswn.dk *Cc:* crimson at technologist.com *Subject:* [hobbit] conn alerts based on ping time
I'm helping someone set up Hobbit at their company, and they want to monitor the status of a remote office T1 link. Of course Hobbit can tell them if the link goes totally down, or you can ignore bad pings with "badconn", but they want to know when the link is *slow*, as they often have periods of time when the pings are not dropped, but instead taking 1-3 seconds (instead of <100ms like normal).
Is there any chance that Hobbit will soon support comparing the ping replies to specifiied values for green, yellow, and red?
Somethign like:
1.2.3.4 myhost.com # conn:200:500
This would make myhost.com's conn test go yellow if the ping was between 200 and 500ms, and red if it was over 500ms. Since hobbit already graphs the numeric values of the ping replies, this seems like it would be fairly easy to add?
-Charles
Jeff,
That's a very good point. Do you know if anyone has documented setting up MRTG with Hobbit? I searched the mailing list archives and didn't find anything concise.
I will probably end up recommending and implementing the MRTG solution, but I still think it should be trivial for hobbit to alert on the ping response, since it already collects that data. I guess a real solution would be for hobbit to come with an MRTG module "out of the box" so that users didn't have to delve into the knowledgebase and/or depend on places like deadcat to find and provide the functionality they need.
I myself don't mind using external scripts and having to tinker with something to get it to work the way I want, but its hard enough to sell management on Hobbit over commercial and well known tools like Nagios, without having to reveal that you need to spend a day downloading external scripts and making them work in order to get the functionality that they expect (and that they think that the commercial tools already have).
I believe that a well-setup hobbit monitor is superior to Nagios and other tools I have tested and been forced to use over the years. But the fact that a lot of the application-specific monitoring (mysql, oracle, postgres, etc), as well as traffic monitoring (MRTG) is handled by third-party scripts that you have to meld into your server probably scares away a lot of people, especially management types who have security folks whispering in their ear to never trust third-party modules and especially not code written by "joe-user from some website" (a manager actually said that to me once). As of yet Hobbit does not even have a fully functional client (no logfile parsing), so we have to use either the bb-client or the bb-msgs script....more third party plugins.
I'm not sure where I'm going with this, I guess what I'm saying is I would like to see Hobbit come with built-in support for monitoring common applications and services (besides the basics). It's already partway there as Hobbit can natively check things like mysql, but what about postgres, oracle?
Henrik is a busy guy I am sure, and he probably doesn't get much compensation for all the fine work he does on Hobbit, nor does he ask for any (I did buy him one of his wishlist items, I hope others do as well). As far as I know, Henrik has nobody helping him, except for seeing him mention someone was working on a new Hobbit client. Maybe what we need is more people to roll up their sleeves and write some modules that are compatible with hobbit with little or no tweaking. Sadly I'm no C/C++ guru, but I am pretty good with Perl :-)
I think also perhaps we need an "official" repository of scripts that work with Hobbit, so when someone needs an addon, they can grab an already Hobbit-ized one, instead of going to deadcat and getting a script to hack on. Also a Wiki might be handy, so that Hobbit users can easily share and update information on various Hobbit setups and problems.
Okay, I have written WAY more than I intended here, I'm so far off topic now that I will edit the subject line as a warning :)
-Charles
Jeff Newman wrote:
Really, honestly, im not trying to belabor a point here, but you need to be careful as the ping only runs every 5 minutes, so even if you could get this alerting to work, the link would have to be slow during a ping cycle. So it could possible be slow for 4 minutes, recover, and the page wouldn't happen, as the ping time would be ok. Assuming the client saw the slowness during those 4 minutes via other methods, they would then question why hobbit didn't see it.
Same thing hapens to me with spikes in network traffic between polling periods, I don't see them.
With MRTG, you can shorten the time to 1 minute. MRTG integration with hobbit isn't too hard, so thats probably the route you should go.
-Jeff
On 1/13/06, *Charles Jones* <jonescr at cisco.com <mailto:jonescr at cisco.com>> wrote:
Deal, Richard wrote:Sounds like they need to through in MRTG and go red when the traffic is high on the link. And then throw in things like bb-ospf.pl to check that ospf is not flapping over the link bb-xsnmp.pl to check out the routers at each end and the interfacesYeah I'm aware of the existance of bb-mrtg.pl, although I have never set it up. I guess I was hoping that Hobbit could natively support ping testing rather than having to install mrtg and hack stuff in. Its sort of confusing for a newbie when you are showing them the ropes of Hobbit and start bringing external scripts into the mix (especially ones that require modifying before they will work).you can also use http to a reliable server on the remote side as part of the link test. Just make the http test for the link dependent on the router and the conn test to the web server.That won't work in this case as all of the companies servers are in a CoLo, Hobbit is running at the CoLo, and they want to test the T1 link at the office from the CoLo (there are no servers on the other side of the office T1 to do a test against), and even if there was, it still would not give them a heads-up to the T1 being slow/saturated, as Hobbit only alerts when the conn test outright fails. -Charles------------------------------------------------------------------------ *From:* Charles Jones [mailto:jonescr at cisco.com] *Sent:* Friday, January 13, 2006 1:01 PM *To:* hobbit at hswn.dk <mailto:hobbit at hswn.dk> *Cc: * crimson at technologist.com <mailto:crimson at technologist.com> *Subject:* [hobbit] conn alerts based on ping time I'm helping someone set up Hobbit at their company, and they want to monitor the status of a remote office T1 link. Of course Hobbit can tell them if the link goes totally down, or you can ignore bad pings with "badconn", but they want to know when the link is *slow*, as they often have periods of time when the pings are not dropped, but instead taking 1-3 seconds (instead of <100ms like normal). Is there any chance that Hobbit will soon support comparing the ping replies to specifiied values for green, yellow, and red? Somethign like: 1.2.3.4 <http://1.2.3.4/> myhost.com <http://myhost.com/> # conn:200:500 This would make myhost.com <http://myhost.com/>'s conn test go yellow if the ping was between 200 and 500ms, and red if it was over 500ms. Since hobbit already graphs the numeric values of the ping replies, this seems like it would be fairly easy to add? -Charles
Charles,
hobbit is already pretty much integrated with MRTG. Sure, you have to install MRTG and configure the mrtg.cfg, but that's pretty simple. Other than that, hobbit is already coded to look for MRTG RRD files. See the tips&tricks section under help. There is a document that describes setting up MRTG with hobbit. I also looked at bb-mrtg and found it daunting to try and figure out how it all works together with hobbit, so I stuck with the standard integration. I also didn't like the fact that plain MRTG with bb-mrtg seemed to not use RRD and stuck with the default way MRTG works (ala lots of .png files) but i could be wrong about that since I ahve never done it.
Anyway in short, putting MRTG (a well known easy to install program) on a the hobbit server and following the instructions in the help pages took me less than 10-15 minutes to get up and running (I had already downloaded the linux MRTG rpm and installed it)
-Jeff
On 1/13/06, Charles Jones <jonescr at cisco.com> wrote:
Jeff,
That's a very good point. Do you know if anyone has documented setting up MRTG with Hobbit? I searched the mailing list archives and didn't find anything concise.
I will probably end up recommending and implementing the MRTG solution, but I still think it should be trivial for hobbit to alert on the ping response, since it already collects that data. I guess a real solution would be for hobbit to come with an MRTG module "out of the box" so that users didn't have to delve into the knowledgebase and/or depend on places like deadcat to find and provide the functionality they need.
I myself don't mind using external scripts and having to tinker with something to get it to work the way I want, but its hard enough to sell management on Hobbit over commercial and well known tools like Nagios, without having to reveal that you need to spend a day downloading external scripts and making them work in order to get the functionality that they expect (and that they think that the commercial tools already have).
I believe that a well-setup hobbit monitor is superior to Nagios and other tools I have tested and been forced to use over the years. But the fact that a lot of the application-specific monitoring (mysql, oracle, postgres, etc), as well as traffic monitoring (MRTG) is handled by third-party scripts that you have to meld into your server probably scares away a lot of people, especially management types who have security folks whispering in their ear to never trust third-party modules and especially not code written by "joe-user from some website" (a manager actually said that to me once). As of yet Hobbit does not even have a fully functional client (no logfile parsing), so we have to use either the bb-client or the bb-msgs script....more third party plugins.
I'm not sure where I'm going with this, I guess what I'm saying is I would like to see Hobbit come with built-in support for monitoring common applications and services (besides the basics). It's already partway there as Hobbit can natively check things like mysql, but what about postgres, oracle?
Henrik is a busy guy I am sure, and he probably doesn't get much compensation for all the fine work he does on Hobbit, nor does he ask for any (I did buy him one of his wishlist items, I hope others do as well). As far as I know, Henrik has nobody helping him, except for seeing him mention someone was working on a new Hobbit client. Maybe what we need is more people to roll up their sleeves and write some modules that are compatible with hobbit with little or no tweaking. Sadly I'm no C/C++ guru, but I am pretty good with Perl :-)
I think also perhaps we need an "official" repository of scripts that work with Hobbit, so when someone needs an addon, they can grab an already Hobbit-ized one, instead of going to deadcat and getting a script to hack on. Also a Wiki might be handy, so that Hobbit users can easily share and update information on various Hobbit setups and problems.
Okay, I have written WAY more than I intended here, I'm so far off topic now that I will edit the subject line as a warning :)
-Charles
Jeff Newman wrote:
Really, honestly, im not trying to belabor a point here, but you need to be careful as the ping only runs every 5 minutes, so even if you could get this alerting to work, the link would have to be slow during a ping cycle. So it could possible be slow for 4 minutes, recover, and the page wouldn't happen, as the ping time would be ok. Assuming the client saw the slowness during those 4 minutes via other methods, they would then question why hobbit didn't see it.
Same thing hapens to me with spikes in network traffic between polling periods, I don't see them.
With MRTG, you can shorten the time to 1 minute. MRTG integration with hobbit isn't too hard, so thats probably the route you should go.
-Jeff
On 1/13/06, Charles Jones <jonescr at cisco.com> wrote:
Deal, Richard wrote:
Sounds like they need to through in MRTG and go red when the traffic is high on the link.
And then throw in things like
bb-ospf.pl to check that ospf is not flapping over the link
bb-xsnmp.pl to check out the routers at each end and the interfaces
Yeah I'm aware of the existance of bb-mrtg.pl, although I have never set it up. I guess I was hoping that Hobbit could natively support ping testing rather than having to install mrtg and hack stuff in. Its sort of confusing for a newbie when you are showing them the ropes of Hobbit and start bringing external scripts into the mix (especially ones that require modifying before they will work).
you can also use http to a reliable server on the remote side as part of the link test. Just make the http test for the link dependent on the router and the conn test to the web server.
That won't work in this case as all of the companies servers are in a CoLo, Hobbit is running at the CoLo, and they want to test the T1 link at the office from the CoLo (there are no servers on the other side of the office T1 to do a test against), and even if there was, it still would not give them a heads-up to the T1 being slow/saturated, as Hobbit only alerts when the conn test outright fails.
-Charles
*From:* Charles Jones [mailto:jonescr at cisco.com <jonescr at cisco.com>] *Sent:* Friday, January 13, 2006 1:01 PM *To:* hobbit at hswn.dk *Cc: *crimson at technologist.com *Subject:* [hobbit] conn alerts based on ping time
I'm helping someone set up Hobbit at their company, and they want to monitor the status of a remote office T1 link. Of course Hobbit can tell them if the link goes totally down, or you can ignore bad pings with "badconn", but they want to know when the link is *slow*, as they often have periods of time when the pings are not dropped, but instead taking 1-3 seconds (instead of <100ms like normal).
Is there any chance that Hobbit will soon support comparing the ping replies to specifiied values for green, yellow, and red?
Somethign like:
1.2.3.4 myhost.com # conn:200:500
This would make myhost.com's conn test go yellow if the ping was between 200 and 500ms, and red if it was over 500ms. Since hobbit already graphs the numeric values of the ping replies, this seems like it would be fairly easy to add?
-Charles
I started replying to Charles' mail a couple of days ago, then waited for a few days before coming back to it. This does cover a lot of different areas so I'm rambling a bit, but bear with me for continuing off the tangent that Charles set out on.
On Fri, Jan 13, 2006 at 02:59:06PM -0700, Charles Jones wrote:
I believe that a well-setup hobbit monitor is superior to Nagios and other tools I have tested and been forced to use over the years. But the fact that a lot of the application-specific monitoring (mysql, oracle, postgres, etc), as well as traffic monitoring (MRTG) is handled by third-party scripts that you have to meld into your server probably scares away a lot of people, especially management types who have security folks whispering in their ear to never trust third-party modules and especially not code written by "joe-user from some website" (a manager actually said that to me once). As of yet Hobbit does not even have a fully functional client (no logfile parsing), so we have to use either the bb-client or the bb-msgs script....more third party plugins.
True. It's one of the things that need to be fixed soon.
I understand your management concerns - I've heard the same stuff over the years. I am fortunate to have a superior who saw the potential for BB and then Hobbit, because Unicenter/TNG couldn't do what we needed - and stuck with that decision for three years until everyone could see that Hobbit is a very good solution to our monitoring needs.
One thing that I've learned over the years is that to get your management interested in Hobbit, you must show them something that they can see is useful. Geeks like myself often get lost in the wonders of technology - "look, it can match the login JSP against this regular expression so we can see if all of the EJB ressources are OK!" ... forget it when talking to bosses. What they want is graphs, reports, and the knowledge that whenever something happens that might affect their bonus, one of the techies will be alerted and take action.
Big Brother was a "geek thing" when I started using it. OK, it was better than Unicenter because you could check if the websites we hosted were OK without any special monitoring console installed. But it didn't *really* catch on until I added LARRD to get the response-time graphs, and started generating reports that showed the monthly availability. That's when management found out that "hey, this looks good" and gave the OK for me to work on it. I still nurse them a bit on occasion - just last week, they complained that a particular kind of reporting to the customer was difficult, so I came up with a tool that pulled all of the data from Hobbit and presented it so that it could be cut-and-pasted into a Word document. A simple one-hour job, but it gives immense credit.
So - listen to your PHB's and try to figure out what it is that they *really* want for a killer feature. If it turns out Hobbit cannot do that out of the box, speak up - it's happened more than once that an essential improvement required only a few lines of code in the right spot. Showing people how quickly Open Source tools can adapt to *your* needs is pretty powerful.
I'm not sure where I'm going with this, I guess what I'm saying is I would like to see Hobbit come with built-in support for monitoring common applications and services (besides the basics). It's already partway there as Hobbit can natively check things like mysql, but what about postgres, oracle?
Would be nice, yes. It can do the Oracle TNS listener, but detailed Oracle checks are missing. Or PostgreSQL, for that matter.
I'm a bit cautious about making the "core" Hobbit tools know everything about anything out there. It would be a maintenance nightmare since there would be lots of stuff that I have no way of testing myself. The add-on mechanism is a bit more work for the admin, but I think people need monitoring for lots of very diverse stuff, and trying to cover all of it would just result in lots of not-very-good solutions. So what I prefer to do is to make Hobbit flexible enough that writing add-ons is easy, and therefore the people who *know* about what's interesting to look at in a DB2 database installation (or whatever it is they want to monitor) can put in the missing pieces and come up with a good monitoring solution.
It's also a lot easier to weed out the bugs in an add-on module, so it it turns out to be generally useful, I have something that has been debugged which I can merge into Hobbit as part of the core toolset.
This brings us to the issue of a repository for Hobbit add-ons:
One thing I really could need some help with is setting up a web repository like deadcat for Hobbit things. Sourceforge could host it, I suppose, but it would need someone to set it up and manage submissions - I don't think Sourceforge has anything automated like deadcat.
Henrik is a busy guy I am sure, and he probably doesn't get much compensation for all the fine work he does on Hobbit, nor does he ask for any (I did buy him one of his wishlist items, I hope others do as well). As far as I know, Henrik has nobody helping him, except for seeing him mention someone was working on a new Hobbit client. Maybe what we need is more people to roll up their sleeves and write some modules that are compatible with hobbit with little or no tweaking.
Thanks :-) Yes, I am fairly busy - got a job to attend to on occasion - so you are right that I could use some help with add-ons for Hobbit.
A colleague of mine *is* working on the Win32 client, so that is well underway. He's got some data flowing now, but there are still some pieces missing.
Sadly I'm no C/C++ guru, but I am pretty good with Perl :-)
I'm just the opposite :-) But that shouldn't keep you back - as long as you "only" write add-on modules there's a great deal of freedom in choosing your tools. Even Hobbit server modules - those that get their input directly from the hobbitd daemon - can be written in any language, since the interface is just reading standard-input. Add-on tests obviously just use the "bb" commandline tool to send their results.
I do appreciate anyone helping with improving Hobbit. But I am also concerned about becoming a bottleneck for getting things published. That's why I would like to have this repository setup so there's an easy way of publishing add-ons, without having to wait for me. If some of you want to gang up and do something together, I can quickly setup a dedicated mailing list for you - if that makes it easier.
(There are over 300 adresses on the hobbit mailing list now ... a year ago, I was proud when it passed the subscription #100. And almost 1500 downloads of the latest version from sourceforge - it's getting big).
The past 18 months have been pretty intense - Hobbit has developed very quickly. I think that will continue for another year or so; there is some stuff I see as needing work right now:
- the client package needs logfile monitoring badly.
- the alert/acknowledge mechanism needs improving, so it can handle things like escalating alerts and different groups acknowledging an alert (this is in fact alreay being worked on).
- the graph displays (which graphs go on which pages) needs an overhaul. The current system is a bit of a mess, and not flexible enough.
- I want to be able to trigger status-changes (and hence alerts) from the data that only goes into the graphs, currently. E.g. instead of the CPU alert triggering if the load average goes above 5 (which is pretty meaningless nowadays), I'd like it to trigger a warning if the %system time exceeds 20, or the %idle goes below 10. Or a "conn" alert if pingtime exceeds 250 ms. (I have a pretty solid idea about how this can be implemented - and it's elegant enough that it would also work for data from custom graphs).
- And I'd like to make the webpages 100% dynamic and ditch the statically generated overview pages. Which could mean that Hobbit would require something like PHP for the display part, or that I need to learn about how XML/XSLT etc. works.
So I won't get bored right away, but I do hope that development could slow down just a bit and there would be more activity just to broaden the range of systems/applications/whatever that Hobbit can monitor.
Henrik
participants (4)
-
henrik@hswn.dk
-
jeffnewman75@gmail.com
-
jonescr@cisco.com
-
rdeal@tigr.ORG