I'm having problems with the HTTP test.
Yesterday, I added additional URL's to monitor.
We monitor a load of URL's, and yesterday, I added a lot.
After adding the additional URL's, I started getting Timeouts
across the board. If I look at the graph for old URL's, it
shows the graph increasing at the same time I implemented
all the new URL's yesterday.
ie. I went from about 30 to 60 URL's.
Does Hobbit stager URL checks? Is there a way to stager them?
Thanks.James
On 11/16/06, James Wade <jkwade at futurefrontiers.com> wrote:
ie… I went from about 30 to 60 URL's.
Does Hobbit stager URL checks? Is there a way to stager them?
I don't know about Hobbit staggering URL checks, but I'm doing a whole bunch more than 60 using scripts fired off by cron. Most of them repeat every ten minutes, some every five mins or every minute, and I've spread them out using cron settings.
These are mostly scripts left over from a previous Big Brother installation. I haven't yet converted them to live entirely in a Hobbit universe, so they still load up the BB definitions and use the BB bin/bb program to deliver reports, but they're working just fine from cron,. In fact, most of them were never fired off by the BB ext script functions. My old, single 733MHz cpu DL380 was trucking along without a problem, running around 1600 checks spread out over 400-odd hosts, until it blew out its power supply...
I don't know offhand how many of my url checks could be converted - quite a few checks are doing logins, or following links through several pages - but I was thinking of doing exactly that. Maybe I'll rethink that strategy... :)
Ralph Mitchell
On Thu, Nov 16, 2006 at 09:31:53AM -0600, James Wade wrote:
After adding the additional URL's, I started getting Timeouts across the board. If I look at the graph for old URL's, it shows the graph increasing at the same time I implemented all the new URL's yesterday. ie. I went from about 30 to 60 URL's.
By default, Hobbit runs lots of network tests in parallel. It has been seen that this can overwhelm either a server or some of your network infrastructure; or just generate enough traffic that packets are dropped on their way to the Hobbit server.
60 URL's aren't a whole lot, though.
Still, you can try lowering the number of concurrent tests that Hobbit performs. The "--concurrency=N" option for bbtest-net does that (goes in hobbitlaunch.cfg). See the bbtest-net(1) man-page.
Henrik
Morning all,
One request on this matter that I'd like to suggest (perhaps in a
future release) is individual HTTP test timeout settings. My customer
needs to monitor a few external URL's that we rely on for various
things. One in particular has frequent problems.
So we upped the default timeout on bbtest-net to 20s, which helped
but it's still not really enough, and I'm reluctant to increase it
further the board for just one bad egg. If there's a workaround to
this particular problem, I'd be happy to hear suggestions.
Thanks and Regards,
Richard.
On 17 Nov 2006, at 08:14, Henrik Stoerner wrote:
On Thu, Nov 16, 2006 at 09:31:53AM -0600, James Wade wrote:
After adding the additional URL's, I started getting Timeouts across the board. If I look at the graph for old URL's, it shows the graph increasing at the same time I implemented all the new URL's yesterday. ie. I went from about 30 to 60 URL's.
By default, Hobbit runs lots of network tests in parallel. It has been seen that this can overwhelm either a server or some of your network infrastructure; or just generate enough traffic that packets are
dropped on their way to the Hobbit server.60 URL's aren't a whole lot, though.
Still, you can try lowering the number of concurrent tests that Hobbit performs. The "--concurrency=N" option for bbtest-net does that
(goes in hobbitlaunch.cfg). See the bbtest-net(1) man-page.Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Richard Leyton - richard at leyton.org http://www.leyton.org
On 11/17/06, Richard Leyton <richard at leyton.org> wrote:
So we upped the default timeout on bbtest-net to 20s, which helped but it's still not really enough, and I'm reluctant to increase it further the board for just one bad egg. If there's a workaround to this particular problem, I'd be happy to hear suggestions.
You could try doing what I do, and run an external script to grab the page. I use curl to actually fetch web pages. Here's a minimal script to give you an idea:
#!/bin/sh
TIMEOUT=60 # give up after 60 seconds COOKIES=/tmp/cookies CURLOPTS="-s -S -L -b $COOKIES -c $COOKIES -m $TIMEOUT" TEST=http
curl $CURLOPTS -o /tmp/page.html http://server.domain.com ret=$? if [ "$ret" -ne "0" ]; then MESSAGE="Something broke. Curl code: $ret" COLOR=red else MESSAGE="Everything is peachy keen!" COLOR=green fi
LINE="status $MACHINE.$TEST $COLOR date
$MESSAGE"
$BB $BBDISP "$LINE"
You can also examine the page for "interesting stuff", such as text that wouldn't appear if the server is broken, or text that shouldn't appear if it's working fine. Curl also handles secure servers, proxies, several different authentication mechanisms, and can give you timing information if you want it for graphs.
Ralph Mitchell
I found a work-around to my problem.
I was monitoring some pages where the security certificate was internally generated. If I went to those pages manually, via a browser window, it would give me a security alert pop-up window saying the security certificate name is invalid and doesn't match the site. (Which it doesn't because they are just using a self-generated one in development for testing)
Anyways, I had about 10 boxes that had this problem, and all 10 were getting timeouts at 10 seconds. This resulted in all the other http tests starting to get random timeouts as well. Almost as though all the other http tests were waiting on these 10 machines.
After I removed the 10 machines from bb-hosts, everything started working just fine, and my graphs went down to 0.XX seconds.
Any suggestions on how I can get past this delay caused by the security alert? What about the reason that a delay in a few boxes causes all http tests to delay on all other machines?
Thanks....James
-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: Friday, November 17, 2006 2:14 AM To: hobbit at hswn.dk Subject: Re: [hobbit] HTTP Test Timeout & Delays
On Thu, Nov 16, 2006 at 09:31:53AM -0600, James Wade wrote:
After adding the additional URL's, I started getting Timeouts across the board. If I look at the graph for old URL's, it shows the graph increasing at the same time I implemented all the new URL's yesterday. ie. I went from about 30 to 60 URL's.
By default, Hobbit runs lots of network tests in parallel. It has been seen that this can overwhelm either a server or some of your network infrastructure; or just generate enough traffic that packets are dropped on their way to the Hobbit server.
60 URL's aren't a whole lot, though.
Still, you can try lowering the number of concurrent tests that Hobbit performs. The "--concurrency=N" option for bbtest-net does that (goes in hobbitlaunch.cfg). See the bbtest-net(1) man-page.
Henrik
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Fri, Nov 17, 2006 at 08:34:07AM -0600, James Wade wrote:
I found a work-around to my problem.
I was monitoring some pages where the security certificate was internally generated. If I went to those pages manually, via a browser window, it would give me a security alert pop-up window saying the security certificate name is invalid and doesn't match the site. (Which it doesn't because they are just using a self-generated one in development for testing)
Anyways, I had about 10 boxes that had this problem, and all 10 were getting timeouts at 10 seconds. This resulted in all the other http tests starting to get random timeouts as well. Almost as though all the other http tests were waiting on these 10 machines.
After I removed the 10 machines from bb-hosts, everything started working just fine, and my graphs went down to 0.XX seconds.
Any suggestions on how I can get past this delay caused by the security alert? What about the reason that a delay in a few boxes causes all http tests to delay on all other machines?
This is very odd. What SSL library are you using for the network tests ? Just run "bbtest-net --version" and you should get:
bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.8a 11 Oct 2005 LDAP library: OpenLDAP 20130
It sounds as if the SSL library is attempting to verify the authenticity of the SSL certificate from your server. But I've never heard of it doing this by default. So I'd like to know which version of OpenSSL you are using, so I can see if there's a configuration setting that Hobbit can tweak to disable this.
Regards, Henrik
$ ./bbtest-net --version bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.8b 04 May 2006 LDAP library: OpenLDAP 20328
This is very odd. What SSL library are you using for the network tests ? Just run "bbtest-net --version" and you should get:
bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.8a 11 Oct 2005 LDAP library: OpenLDAP 20130
It sounds as if the SSL library is attempting to verify the authenticity of the SSL certificate from your server. But I've never heard of it doing this by default. So I'd like to know which version of OpenSSL you are using, so I can see if there's a configuration setting that Hobbit can tweak to disable this.
Regards, Henrik
-----Original Message----- From: Henrik Stoerner [mailto:henrik at hswn.dk] Sent: Friday, November 17, 2006 9:01 AM To: hobbit at hswn.dk Subject: Re: [hobbit] HTTP Test Timeout & Delays -- More Info
On Fri, Nov 17, 2006 at 08:34:07AM -0600, James Wade wrote:
I found a work-around to my problem.
I was monitoring some pages where the security certificate was internally generated. If I went to those pages manually, via a browser window, it would give me a security alert pop-up window saying the security certificate name is invalid and doesn't match the site. (Which it doesn't because they are just using a self-generated one in development for testing)
Anyways, I had about 10 boxes that had this problem, and all 10 were getting timeouts at 10 seconds. This resulted in all the other http tests starting to get random timeouts as well. Almost as though all the other http tests were waiting on these 10 machines.
After I removed the 10 machines from bb-hosts, everything started working just fine, and my graphs went down to 0.XX seconds.
Any suggestions on how I can get past this delay caused by the security alert? What about the reason that a delay in a few boxes causes all http tests to delay on all other machines?
On Thursday 16 November 2006 16:31, James Wade wrote:
I'm having problems with the HTTP test.
Yesterday, I added additional URL's to monitor. We monitor a load of URL's, and yesterday, I added a lot.
After adding the additional URL's, I started getting Timeouts across the board. If I look at the graph for old URL's, it shows the graph increasing at the same time I implemented all the new URL's yesterday.
ie. I went from about 30 to 60 URL's.
Does Hobbit stager URL checks? Is there a way to stager them? I have a similar problem. I have +/- 200 netwerk tests (ping, ...) and +/- 20 http tests. All on local LAN. I added a second hobbit server on a remote connection (1mbps) and I had lots of timeout on the http checks.
Hobbit is doing per default 254 network checks in parallel. Let's say the first packet is 1 kbyte this means 254 kbyte/s. And I only have a 1mb(it)ps line!!! The solution was limiting bbnet to only 4 parallel checks --concurrency=4 to the bbnet start command.
Actullay, the strange thing is that there is a load balancer involved. So if
I do the remote check http and use the load balancer, I get +10 seconds.
Directly to the http server gives a normal result. This is only for the 254
parallel checks, 4 parallel checks and everything is normal.
Stef
participants (5)
-
henrik@hswn.dk
-
jkwade@futurefrontiers.com
-
ralphmitchell@gmail.com
-
richard@leyton.org
-
stef.coene@docum.org