randomizing execution of tests
Greetings hobbit gurus [0],
While I am still trying to search my way to an answer via the archives
of this list and google, I'm hoping someone could point me in the
right direction.
I've got a bb-hosts file with 8 server process instances getting
tested. Each instance gets tested with 3 HTTP requests (2 GET, 1
POST). All 8 server processes live on the same physical OS instance.
This results in 24 HTTP requests getting sent from hobbit within
1/100th of a second. This causes the load on the host to spike, and
generates contention w/in each server to satisfy the requests. This
same setup is repeated for hundreds of hosts and hundreds of processes.
Is there a way to tell hobbit to take all of the entries in bb-hosts
and test them in a random order w/in the 1 minute testing interval?
This would end up staggering the arrival of each HTTP test somewhat
and lessen contention within each HTTP server and on each host.
Thanks,
-dave
[0] Of which I am not, but ... maybe one day.
-- Dave Paper
MCSE is to computers as McDonalds Certified Chef is to fine cuisine.
I don't think that's possible with Xymon right now, but it can be done if you're up to a little scripting. I had an aging, single 733MHz cpu DL380 running web page checkouts on 400+ hosts, generating around 2700 reports, running at various intervals from 30 seconds to 24 hours.
The trick is to use cron for scheduling...
Something like this, for instance:
============= cut here ============ #!/bin/sh
TESTHOST=www.google.com TESTURL=http://$TESTHOST/
TIMEOUT=30
Grab *just* the headers, simulating Xymon's builtin http check
MESSAGE=curl -m $TIMEOUT \ -w 'Seconds: %{time_total}\n' \ -s -S -L -I $TESTURL | $GREP -v Set-Cookie
if [ "$?" -eq "0" ]; then
COLOR=green
else
COLOR=red
fi
convert dots to commas in the hostname
MACHINE=`echo $TESTHOST | $SED -e 's/\./\,/g'
$BB $BBDISP "status $MACHINE.home $COLOR date
$MESSAGE" ============= cut here ============
You'd run that from xymon's crontab using a command line like:
$HOME/server/bin/bbcmd $HOME/ping-google.sh > /tmp/ping-google.out 2>&1
at whatever interval is appropriate for the target.
Ralph Mitchell
On Thu, Feb 5, 2009 at 11:25 AM, David Paper <hobbit at ginch.org> wrote:
Greetings hobbit gurus [0],
While I am still trying to search my way to an answer via the archives of this list and google, I'm hoping someone could point me in the right direction.
I've got a bb-hosts file with 8 server process instances getting tested. Each instance gets tested with 3 HTTP requests (2 GET, 1 POST). All 8 server processes live on the same physical OS instance. This results in 24 HTTP requests getting sent from hobbit within 1/100th of a second. This causes the load on the host to spike, and generates contention w/in each server to satisfy the requests. This same setup is repeated for hundreds of hosts and hundreds of processes.
Is there a way to tell hobbit to take all of the entries in bb-hosts and test them in a random order w/in the 1 minute testing interval? This would end up staggering the arrival of each HTTP test somewhat and lessen contention within each HTTP server and on each host.
Thanks,
-dave
[0] Of which I am not, but ... maybe one day.
-- Dave Paper
MCSE is to computers as McDonalds Certified Chef is to fine cuisine.
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Fri, Feb 6, 2009 at 3:48 PM, Ralph Mitchell <ralphmitchell at gmail.com> wrote:
I don't think that's possible with Xymon right now, but it can be done if you're up to a little scripting. I had an aging, single 733MHz cpu DL380 running web page checkouts on 400+ hosts, generating around 2700 reports, running at various intervals from 30 seconds to 24 hours.
The trick is to use cron for scheduling...
Something like this, for instance:
============= cut here ============ #!/bin/sh
TESTHOST=www.google.com TESTURL=http://$TESTHOST/
TIMEOUT=30
Grab *just* the headers, simulating Xymon's builtin http check
MESSAGE=
curl -m $TIMEOUT \ -w 'Seconds: %{time_total}\n' \ -s -S -L -I $TESTURL | $GREP -v Set-Cookieif [ "$?" -eq "0" ]; then COLOR=green else COLOR=red ficonvert dots to commas in the hostname
MACHINE=`echo $TESTHOST | $SED -e 's/\./\,/g'
$BB $BBDISP "status $MACHINE.home $COLOR
date$MESSAGE" ============= cut here ============
This curl command looks all I need as an extension script instead of http:// to get my host specific http timeout
I could just use this instead of urlplus.pl, correct?
You'd run that from xymon's crontab using a command line like:
$HOME/server/bin/bbcmd $HOME/ping-google.sh > /tmp/ping-google.out 2>&1at whatever interval is appropriate for the target.
Ralph Mitchell
On Thu, Feb 5, 2009 at 11:25 AM, David Paper <hobbit at ginch.org> wrote:
Greetings hobbit gurus [0],
While I am still trying to search my way to an answer via the archives of this list and google, I'm hoping someone could point me in the right direction.
I've got a bb-hosts file with 8 server process instances getting tested. Each instance gets tested with 3 HTTP requests (2 GET, 1 POST). All 8 server processes live on the same physical OS instance. This results in 24 HTTP requests getting sent from hobbit within 1/100th of a second. This causes the load on the host to spike, and generates contention w/in each server to satisfy the requests. This same setup is repeated for hundreds of hosts and hundreds of processes.
Is there a way to tell hobbit to take all of the entries in bb-hosts and test them in a random order w/in the 1 minute testing interval? This would end up staggering the arrival of each HTTP test somewhat and lessen contention within each HTTP server and on each host.
Thanks,
-dave
[0] Of which I am not, but ... maybe one day.
-- Dave Paper
MCSE is to computers as McDonalds Certified Chef is to fine cuisine.
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?
On Fri, Feb 6, 2009 at 4:45 PM, Asif Iqbal <vadud3 at gmail.com> wrote:
On Fri, Feb 6, 2009 at 3:48 PM, Ralph Mitchell <ralphmitchell at gmail.com> wrote:
I don't think that's possible with Xymon right now, but it can be done if you're up to a little scripting. I had an aging, single 733MHz cpu DL380 running web page checkouts on 400+ hosts, generating around 2700 reports, running at various intervals from 30 seconds to 24 hours.
The trick is to use cron for scheduling...
Something like this, for instance:
============= cut here ============ #!/bin/sh
TESTHOST=www.google.com TESTURL=http://$TESTHOST/
TIMEOUT=30
Grab *just* the headers, simulating Xymon's builtin http check
MESSAGE=
curl -m $TIMEOUT \ -w 'Seconds: %{time_total}\n' \ -s -S -L -I $TESTURL | $GREP -v Set-Cookieif [ "$?" -eq "0" ]; then COLOR=green else COLOR=red ficonvert dots to commas in the hostname
MACHINE=`echo $TESTHOST | $SED -e 's/\./\,/g'
$BB $BBDISP "status $MACHINE.home $COLOR
date$MESSAGE" ============= cut here ============
This curl command looks all I need as an extension script instead of http:// to get my host specific http timeout
I could just use this instead of urlplus.pl, correct?
Yes, you could do exactly that. You'll probably want to make the above script into a function or child script and feed it the hostname, url & max time values pulled from a file.
Ralph Mitchell
You can also stretch out your testing interval by limiting the concurrency in bbtest-net. See the man page for the exact syntax.
Thanks, Larry Barber
On Thu, Feb 5, 2009 at 11:25 AM, David Paper <hobbit at ginch.org> wrote:
Greetings hobbit gurus [0],
While I am still trying to search my way to an answer via the archives of this list and google, I'm hoping someone could point me in the right direction.
I've got a bb-hosts file with 8 server process instances getting tested. Each instance gets tested with 3 HTTP requests (2 GET, 1 POST). All 8 server processes live on the same physical OS instance. This results in 24 HTTP requests getting sent from hobbit within 1/100th of a second. This causes the load on the host to spike, and generates contention w/in each server to satisfy the requests. This same setup is repeated for hundreds of hosts and hundreds of processes.
Is there a way to tell hobbit to take all of the entries in bb-hosts and test them in a random order w/in the 1 minute testing interval? This would end up staggering the arrival of each HTTP test somewhat and lessen contention within each HTTP server and on each host.
Thanks,
-dave
[0] Of which I am not, but ... maybe one day.
-- Dave Paper
MCSE is to computers as McDonalds Certified Chef is to fine cuisine.
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
On Sat, Feb 7, 2009 at 12:09 AM, Larry Barber <lebarber at gmail.com> wrote:
You can also stretch out your testing interval by limiting the concurrency in bbtest-net. See the man page for the exact syntax.
problem is not on hobbit server side, in which case reducing the concurrency limit will help. It is actually one specific http server which takes longer than default 10 second to respond. I don't want to change the default timeout for all. I was looking for a {host,service} specific timeout. Since that does not exist an extension script with curl for just that one http server will do the job.
Thanks, Larry Barber
On Thu, Feb 5, 2009 at 11:25 AM, David Paper <hobbit at ginch.org> wrote:
Greetings hobbit gurus [0],
While I am still trying to search my way to an answer via the archives of this list and google, I'm hoping someone could point me in the right direction.
I've got a bb-hosts file with 8 server process instances getting tested. Each instance gets tested with 3 HTTP requests (2 GET, 1 POST). All 8 server processes live on the same physical OS instance. This results in 24 HTTP requests getting sent from hobbit within 1/100th of a second. This causes the load on the host to spike, and generates contention w/in each server to satisfy the requests. This same setup is repeated for hundreds of hosts and hundreds of processes.
Is there a way to tell hobbit to take all of the entries in bb-hosts and test them in a random order w/in the 1 minute testing interval? This would end up staggering the arrival of each HTTP test somewhat and lessen contention within each HTTP server and on each host.
Thanks,
-dave
[0] Of which I am not, but ... maybe one day.
-- Dave Paper
MCSE is to computers as McDonalds Certified Chef is to fine cuisine.
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?
On Thu, Feb 05, 2009 at 12:25:48PM -0500, David Paper wrote:
I've got a bb-hosts file with 8 server process instances getting tested.
Each instance gets tested with 3 HTTP requests (2 GET, 1 POST). All 8 server processes live on the same physical OS instance. This results in 24 HTTP requests getting sent from hobbit within 1/100th of a second.
This causes the load on the host to spike, and generates contention w/in each server to satisfy the requests. This same setup is repeated for hundreds of hosts and hundreds of processes.
Yeah, Xymon can be pretty agressive about testing network services. It's unfortunate that you hit the same server with multiple requests at once, although it is a bit unusual that a web/application server has a problem with just 24 requests simultaneously.
But of course, it depends on what your web application does.
Is there a way to tell hobbit to take all of the entries in bb-hosts and test them in a random order w/in the 1 minute testing interval? This would end up staggering the arrival of each HTTP test somewhat and lessen contention within each HTTP server and on each host.
I'm afraid not, at least not directly.
What you *could do was to setup more than one task to run the network tests, using the "NET:foo" definition in bb-hosts to create groups of tests than can be run simultaneously. From you description it sounds as if you have 8 "pseudo-hosts" defined in bb-hosts, each of them with 3 HTTP tests. I.e.
0.0.0.0 web1 # http://web1/test1 http://web1/test2
post;http://web1/test3;blababla
0.0.0.0 web2 # http://web2/test1 http://web2/test2
post;http://web2/test3;blababla
etc. for your 8 server processes. Is that correct ? Or would it at least be possible to configure it that way ?
If you can do that, then you can assign each of web1, web2, web3 and so on a different NET: setting, and then you run the checks for each of these NET's one at a time.
E.g. if you want to run the tests for "web1" and "web2" separately from "web3" and "web4", then your bb-hosts file would be like this:
0.0.0.0 web1 # http://web1/test1 http://web1/test2
post;http://web1/test3;blababla
NET:testgroup1
0.0.0.0 web2 # http://web2/test1 http://web2/test2
post;http://web2/test3;blababla
NET:testgroup1
0.0.0.0 web3 # http://web3/test1 http://web3/test2
post;http://web3/test3;blababla
NET:testgroup2
0.0.0.0 web4 # http://web4/test1 http://web4/test2
post;http://web4/test3;blababla
NET:testgroup2
So you have two "NET:" groups of tests - "testgroup1" and "testgroup2". Then you change to [bbnet] task to run this script "runnetworktests.sh" instead of just launching bbtest-net:
#!/bin/sh
Run each NET group of tests separately
This script passes all commandline options to bbtest-net
BBLOCATION=testgroup1 bbtest-net $* BBLOCATION=testgroup2 bbtest-net $*
Finally, run the tests that have no NET definition
bbtest-net --test-untagged $*
exit 0
The [bbnet] task (in hobbitlaunch.cfg) would then have
CMD runnetworktests.sh --report --ping --checkresponse
instead of the default "CMD bbtest-net --report --ping --checkresponse"
The only "problem" with this is that you get to do the configuration of what tests can run simultaneously by hand. And each invocation of bbtest-net has to parse all of the bb-hosts file, there is a small overhead in doing that.
Regards, Henrik
Hi,
Henrik Størner wrote:
The only "problem" with this is that you get to do the configuration of what tests can run simultaneously by hand.
I'd like to second that feature request (if it's not already one I would like to make it one ;-)). We have ~1k such "servers", but there are only ~100 real machines. (In fact, it's a bit more complicated.) So it can happen, that some machines are hit by up 20 monitoring requests simultanously. These machines are no simple HTTP servers but something more advanced. Unfortunately each of this servers can only deal with 4 requests in parallel, others are queued up and delayed. Same goes for real, non-monitoring requests, they are delayed during such bulks of monitoring requests and users of the servers have to wait.
OTOH with so many servers we can't manage several monitoring groups keeping in mind how many checks are in one groups and which one still has "free slots" available. Having such a randomization and spreading in time integrated into bbtest-net would be really great. (Thinking about it, spreading in time is maybe difficult because you never know how long all tests will take. Randomization of test order should be quite simple. But I don't know anything about hobbit internals.)
Kind regards,
Navteq (DE) GmbH Frank Gruellich Map24 Systems and Networks
Duesseldorfer Strasse 40a 65760 Eschborn Germany
Phone: +49 6196 77756-414 Fax: +49 6196 77756-100
USt-ID-No.: DE 197947163 Managing Directors: Thomas Golob, Alexander Wiegand, Hans Pieter Gieszen, Martin Robert Stockman
On Fri, Feb 13, 2009 at 06:01:44PM +0100, Frank Gruellich wrote:
OTOH with so many servers we can't manage several monitoring groups keeping in mind how many checks are in one groups and which one still has "free slots" available. Having such a randomization and spreading in time integrated into bbtest-net would be really great. (Thinking about it, spreading in time is maybe difficult because you never know how long all tests will take. Randomization of test order should be quite simple. But I don't know anything about hobbit internals.)
You're right that randomizing the sequence of tests is simple - the attached patch against 4.3.0 should do that nicely.
Completely untested, but it should work :-) It won't be difficult to port over to the 4.2.3 version if needed.
Spreading things out over a longer time requires much more of a re-design. That may happen - I have some ideas about doing a major re-design of how network tests are done - but it will be a while before that evolves into any code.
Regards, Henrik
Now this is what I'm talking about!
Thanks to Ralph, Asif, Larry, Frank and of course, Henrik.
The servers being tested and constantly being improved, but at the
moment, can take several seconds to respond. While the server is
busy, it chews an entire CPU, so when Hobbit's net tests run in
parallel and hit all 8 servers on the host at once, host runs out of
CPU.
I've dropped the hobbit concurrency down from 512 to 50, but hasn't
had much of an effect.
The patch sounds like it'll do the trick.
Thanks!
-dave
On Feb 14, 2009, at 3:05 AM, Henrik Størner wrote:
On Fri, Feb 13, 2009 at 06:01:44PM +0100, Frank Gruellich wrote:
OTOH with so many servers we can't manage several monitoring groups keeping in mind how many checks are in one groups and which one still has "free slots" available. Having such a randomization and
spreading in time integrated into bbtest-net would be really great. (Thinking about it, spreading in time is maybe difficult because you never know how long all tests will take. Randomization of test order should be quite simple. But I don't know anything about hobbit internals.)You're right that randomizing the sequence of tests is simple - the attached patch against 4.3.0 should do that nicely.
Completely untested, but it should work :-) It won't be difficult to port over to the 4.2.3 version if needed.
Spreading things out over a longer time requires much more of a re-design. That may happen - I have some ideas about doing a major re-design of how network tests are done - but it will be a while before that evolves into any code.
Regards, Henrik
<randomize-tests.patch>To unsubscribe from the hobbit list, send an
e-mail to hobbit-unsubscribe at hswn.dk
-- Dave Paper cerberus at ginch.org
"Hello, I must be going." --Groucho
participants (7)
-
cerberus@ginch.org
-
frank.gruellich@navteq.com
-
henrik@hswn.dk
-
hobbit@ginch.org
-
lebarber@gmail.com
-
ralphmitchell@gmail.com
-
vadud3@gmail.com