I am working on setting up graphs to track an application. I have created a test that produces the output at the end of this message, I want to use SPLITNCV to handle it, but I don't want to be required to update the hobbit configuration and restart when the number of indexes in the system increases.
Is there a way to define the SPLITNCV_testname variable in such a way that it can handle this? Also, I want to be able to set up the live_count and broker_count variables as counters, but be able to track negative changes as well as positive. I'm not sure if the DERIVE format mentioned in the docs will do negative changes. Normally, the value marches ever higher, but when there are problems, it will go down, and I want to see that on the graph. The _time variables and the rest of the _count variables will be gauges.
Is this something that can be accomplished? If I have to set up all the graphs as gauge and deal with an ever-increasing value for the special entries, then I will do that.
This is on the production system, running version 4.2. I started with the source package from the debian backports archive, then patched with the central BBWin patch and the allinone patch in addition to the existing debian-specific patches. I had to resolve a couple of conflicts, but got it to compile and run.
Script output:
live_count : 30635889 diff_count : 0 broker_time : 1355 broker_count : 30635889 rss_time : 4525 rss_count : 1680777 indy1s2_time : 863 indy1s2_count : 53532 indy1s3_time : 34 indy1s3_count : 789 indy2s1_time : 23 indy2s1_count : 674006 indy2s2_time : 25 indy2s2_count : 952450 indy3s1_time : 21 indy3s1_count : 1004714 indy4s1_time : 32 indy4s1_count : 997684 indy4s2_time : 32 indy4s2_count : 996035 indy5s1_time : 27 indy5s1_count : 1000602 indy5s2_time : 77 indy5s2_count : 999003 indy6s1_time : 26 indy6s1_count : 1003493 indy6s2_time : 25 indy6s2_count : 1001920 indy7s1_time : 28 indy7s1_count : 999484 indy7s2_time : 26 indy7s2_count : 1003172 indy8s1_time : 25 indy8s1_count : 1004038 indy8s2_time : 24 indy8s2_count : 1000058 indy9s1_time : 25 indy9s1_count : 999745 indy9s2_time : 24 indy9s2_count : 998908 indy10s1_time : 28 indy10s1_count : 995340 indy10s2_time : 25 indy10s2_count : 1001914 indy11s1_time : 21 indy11s1_count : 998153 indy11s2_time : 20 indy11s2_count : 1000448 indy12s1_time : 21 indy12s1_count : 995504 indy12s2_time : 22 indy12s2_count : 996100 indy13s1_time : 26 indy13s1_count : 997570 indy13s2_time : 50 indy13s2_count : 999154 indy14s1_time : 22 indy14s1_count : 997059 indy14s2_time : 21 indy14s2_count : 995457 indy15s1_time : 27 indy15s1_count : 1003780 indy15s2_time : 25 indy15s2_count : 997950 indy16s1_time : 66 indy16s1_count : 999673 indy16s2_time : 34 indy16s2_count : 994129 indy17s1_time : 28 indy17s1_count : 998302 indy17s2_time : 56 indy17s2_count : 975723
Shawn Heisey wrote:
I am working on setting up graphs to track an application. I have created a test that produces the output at the end of this message, I want to use SPLITNCV to handle it, but I don't want to be required to update the hobbit configuration and restart when the number of indexes in the system increases. I have not been able to find a SPLITNCV example like the following one for regular NCV:
Shawn
I'm not using SPLITNCV as I wanted a bit more flexibility in the format of the status report (I wanted to have comment line, single colons not delimiting values etc.), but am using an external script. You may find my earlier reply here http://www.hswn.dk/hobbiton/2008/10/msg00159.html useful though.
With the external script mechanism you don't need to restart Hobbit if your test generates additional indexes, only if you add new tests. I'm not entirely sure whether SPLITNCV works the same although it looks OK - but you sound perfectly at home with the source, so have a look at that (do_ncv.c). If you're interested, I attach my parsing script to the end of this - enter the test name list and change the regex for your needs. The commented lines were from when I was using a single RRD file for all indices, but that doesn't give the flexibility of displaying multiple graphs, or adding additional indices.
Graham Nayler
#!/usr/bin/python
import sys, re
def main(): #print len(sys.argv), sys.argv if( sys.argv[2] in (<enter test name list here>)): #print "%s scanning file '%s'"%(sys.argv[2], sys.argv[3]) data = "" lineno = 0 f = open(sys.argv[3],'r') for line in f: lineno = lineno+1 if (lineno > 2): mo = re.match("(.*\s+)?([^\s]+)\s*::\s*(-?[0-9\.]*).*$",line) if not (mo == None): if( len(mo.group(3)) > 0 ): print "DS:%s:GAUGE:600:U:U"%mo.group(2)
if len(data) > 0:
data = data + ":" + mo.group(3)
else:
data = mo.group(3)
print "%s.%s.rrd"%(sys.argv[2],mo.group(2))
print mo.group(3)
f.close()
if( len(data) > 0 ):
print "%s.rrd"%sys.argv[2]
print data
if __name__ == "__main__": main()
----- Original Message ----- From: "Shawn Heisey" <hobbit at elyograg.org> To: <hobbit at hswn.dk> Sent: Monday, October 13, 2008 9:56 PM Subject: Re: [hobbit] wildcards or regex with SPLITNCV
Shawn Heisey wrote:
I am working on setting up graphs to track an application. I have created a test that produces the output at the end of this message, I want to use SPLITNCV to handle it, but I don't want to be required to update the hobbit configuration and restart when the number of indexes in the system increases. I have not been able to find a SPLITNCV example like the following one for regular NCV:
http://www.hswn.dk/~henrik/howtograph.txt
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Graham Nayler wrote:
Shawn
I'm not using SPLITNCV as I wanted a bit more flexibility in the format of the status report (I wanted to have comment line, single colons not delimiting values etc.), but am using an external script. You may find my earlier reply here http://www.hswn.dk/hobbiton/2008/10/msg00159.html useful though.
With the external script mechanism you don't need to restart Hobbit if your test generates additional indexes, only if you add new tests. I'm not entirely sure whether SPLITNCV works the same although it looks OK
- but you sound perfectly at home with the source, so have a look at that (do_ncv.c). If you're interested, I attach my parsing script to the end of this - enter the test name list and change the regex for your needs. The commented lines were from when I was using a single RRD file for all indices, but that doesn't give the flexibility of displaying multiple graphs, or adding additional indices.
I finally got around to looking at this. I think I'm even more confused. Not sure where you got the idea I'm comfortable with the source ... I've looked at the 4.3 sources trying to get rid of warnings and get it working, but the only thing that did was remind me just how many years it's been since I did any C programming. My eyes glazed over a bit with your python too, I haven't invested any time in that language yet.
I'll start at the beginning and tell everyone what it is I'm trying to do and the difficulties I'm facing. This is an attempt to monitor a multi-server EasyAsk search index at a remote data center, to which I have no back-end connectivity. The systems have no way to reach the Internet. I wouldn't have designed it that way; it is an acquisition company. I can log onto them by making an ssh connection to a gateway server that has a NIC in the remote LAN, and there are public-facing webservers that also can reach it. We are going to move everything out of the data center within the next six months, so I have no plans to redesign the network until it is moved to headquarters.
The data is generated by a CGI script running on the public facing webserver pair, which an external server-side shell script on Hobbit is retrieving with wget. The CGI script queries the search broker and each individual index server. It notes the total number of records held by each index server, adds them all up, and compares that value to the number of records reported by the broker. It also records how long in milliseconds each query takes. It's basically a machine-readable rewrite of a script that produces a pretty status page. We can't watch the values on that page 24/7, so I want to graph them to watch for problems.
I didn't go with the external script idea because the RRD docs say it doesn't scale well. I was hoping that NCV or SPLITNCV would handle it easily. I am leery of implementing things that don't scale well - I've been bitten in the past because the boss liked what he saw on something I'd hacked together without thought to performance and wanted to deploy it everywhere.
What I'd like to see is a series of graphs, the first of which should have the total count and the broker count, then a graph with just the difference. Then I'd like to have graphs that work like the disk graphs, where it aggregates the individual broker counts. Following that, another series of graphs that aggregate the response times.
I think it might be easier to implement and easier to read if have four separate columns on the host entry, something like i_totals, i_diff, i_counts, and i_time.
Forgetting about scalability, are there good examples for how to accomplish this, or a kind soul willing to guide me through the process? I can tweak the CGI script and the script on Hobbit that calls it in any way required.
Shawn,
Sorry, I obviously overinterpreted your earlier posts about patching and getting your snapshot working.
I can't really comment about your application, other than to say I think you'll have a major problem getting different numbers of traces on different graphs within a single test. Running multiple tests, with some having individual traces/graph and some having multiple, is probably the only way you'll be able to manage that.
But back to your SPLITNCV problem, I finally had a bit of a closer look at it, as it's so closely related to what I've been doing myself. Yes it will allow adding datasources within a test without restarting hobbit (or more accurately, and seriously, having to delete and reinitialise the RRD files, so losing previous history). Don't know about it's support in 4.2, but it is supported in 4.3 - but it is broken.
Here's my reply to someone else today on the subject, which describes the usage and the fix http://www.hswn.dk/hobbiton/2008/10/msg00423.html
Regarding the python script, essentially it skips the first two header lines, then parses any lines it sees of the format <any old junk> {space} <datasource_name> {space} :: <signed floating point value> <more junk>
and writes for each line found the following three lines to stdout DS:<datasource_name>:GAUGE:600:U:U <testname>.<datasource_name>.rrd <value>
As it receives each <value> the script host (hobbitd_rrd) updates (or generates if required) the named RRD file.
Subsequently, I've now changed it to output a DS line equivalent to what the SPLITNCV mechanism does DS:lambda:GAUGE:600:U:U partly as I had some very long datasource names, and RRD throws an error (and fails to create the file) if it sees datasources longer than 19 characters.
FYI, as I see it, the overheads of using external script mechanism additional to the SPLITNCV methods are: one process fork per test report write the body of the test report to a temporary diskfile run the script delete the temporary diskfile
Graham Nayler
----- Original Message ----- From: "Shawn Heisey" <elyograg at elyograg.org> To: <hobbit at hswn.dk>; <apalmer at mainstreamdata.com> Sent: Monday, October 20, 2008 7:38 PM Subject: Re: [hobbit] wildcards or regex with SPLITNCV
{snip}
I finally got around to looking at this. I think I'm even more confused. Not sure where you got the idea I'm comfortable with the source ... I've looked at the 4.3 sources trying to get rid of warnings and get it working, but the only thing that did was remind me just how many years it's been since I did any C programming. My eyes glazed over a bit with your python too, I haven't invested any time in that language yet.
{snip}
participants (3)
-
elyograg@elyograg.org
-
graham.nayler@hallmarq.net
-
hobbit@elyograg.org