In <4213B272.7040306 at nandomedia.com> Tom Georgoulias <tgeorgoulias at nandomedia.com> writes:
I'm using the filerstats2bb script from deadcat.net to get from my Netapp filers and displaying it in hobbit. This is what is displayed on the status page:
conn, cpu, disk, info, inode, qtree, trends, user_quota
The data displayed is accurate, but the only graph that works is conn. The rest are severely broken.
That figures, since the "conn" test is run by Hobbit (bbtest-net) and reports data in a form that Hobbit knows how to handle.
I would like to fix this, starting with CPU. I'm hoping that what I learn here can be used when I attempt to create custom graphs for user_quota & qtree with the custom RRD feature described in hobbitd_larrd.
You can use some of it, but there is a difference between fixing an existing handler (hobbit already handles some "cpu" data), and adding a new handler that hobbit does not know about. Simply because when fixing the cpu-handler, you really have to fix the current C code.
The rest of this message concerns only the load average/CPU graph problem, since I figure this ought to work without any modification.
For example, this is the contents of a status summary displayed on the CPU status page: == Wed Feb 16 14:59:46 EST 2005 - CPU Utilization on filerA.nandomedia.com is OK. Uptime: 63 days, 06:57:54.29, load=1
The best way of working with the RRD data that Hobbit handles is to snoop on the data that is sent from hobbitd to the hobbitd_larrd program. You can do that by listening on the hobbit "status" channel: ~/server/bin/bbcmd sh hobbitd_channel --channel=status cat When the "cpu" status arrives, you'll see something like this: @@status#121308|1108589727.548324|172.16.10.2||voodoo.hswn.dk|cpu|1108591527|green||green|1106668421|0||0| status voodoo,hswn,dk.cpu green Wed Feb 16 22:35:27 CET 2005 up: 23 days, 2 users, 171 procs, load=11 top - 22:35:27 up 23 days, 48 min, 2 users, load average: 0.24, 0.11, 0.09 Tasks: 170 total, 1 running, 169 sleeping, 0 stopped, 0 zombie Cpu(s): 4.2% us, 1.5% sy, 0.1% ni, 91.2% id, 2.8% wa, 0.1% hi, 0.1% si Mem: 646876k total, 635204k used, 11672k free, 194116k buffers Swap: 787176k total, 23608k used, 763568k free, 123284k cached [lots of lines from "top" snipped] @@ The first line with "@@status..." is the beginning of a message - it has some information that hobbitd picks out from all messages, like the hostname, test-name, color etc. The important thing here is to see that hobbitd does see that it is a "cpu" status - there's "|cpu|" in the first line. That means hobbitd_larrd will send this message through the "cpu" handler in hobbitd/larrd/do_la.c. So we need to look at what the do_la.c file does. eoln = strchr(msg, '\n'); if (eoln) *eoln = '\0'; This finds the first new-line character, and cuts off anything after that. So essentially, it only looks at the first line of the status message. p = strstr(msg, "up: "); if (p) { .... process the message .... This searches the message (or rather, the first line of it), for the string "up: " . I suspect this is where it breaks for your Netapp reports, because they have "Uptime:", not "up: "
Wed Feb 16 14:59:46 EST 2005 - CPU Utilization on filerA.nandomedia.com is OK. Uptime: 63 days, 06:57:54.29, load=1
Yes, computers are picky about such details ... So the first fix is to change those lines above to handle a report with the keyword "Uptime:" - e.g. like this: p = strstr(msg, "up: "); if (!p) p = strstr(msg, "Uptime:"); if (p) { Just one line added. But in this case, I think it makes all the difference - because the rest of the reports looks like it will be handled just fine by the current code in do_la.c I've added this fix to my sources. Not much info here about doing custom graphs, I'm afraid. But if you look over the example in the hobbitd_larrd man-page, it should get you started. If not, feel free to ask for more help. Henrik PS: If you want me to look at that Netapp disk-report that isn't being graphed, just send me an example of what such a report looks like. H.