Dominique Frise wrote:
Dominique Frise wrote:
Hi,
We track "surgemail" processes using following rule in hobbit-clients.cfg:
HOST=xyz PROC ./surgemail min=0 TRACK=surgemail
The ps listing in msg.xyz.txt reports 315 "./surgemail" processes, while the rrd graph only shows ~30 processes.
Here the last corresponding dataset of processes.surgemail.rrd file (after flushing the cache by stopping Xymon):
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE rrd SYSTEM "http://oss.oetiker.ch/rrdtool/rrdtool.dtd"> <!-- Round Robin Database Dump --><rrd> <version> 0003 </version> <step> 300 </step> <!-- Seconds --> <lastupdate> 1239775972 </lastupdate> <!-- 2009-04-15 08:12:52 CEST -->
<ds> <name> count </name> <type> GAUGE </type> <minimal_heartbeat> 600 </minimal_heartbeat> <min> 0.0000000000e+00 </min> <max> NaN </max>
<!-- PDP Status --> <last_ds> 30 </last_ds> <value> 5.1600000000e+03 </value> <unknown_sec> 0 </unknown_sec> </ds>
<!-- Round Robin Archives --> <rra>
We tried to let Xymon recreate a fresh rrd without success. The same configuration was working with Hobbit-4.2.0/RRDtool 1.2.19 (same version)
The rrd-code has pretty changed since 4.2.0 and I don't really see what code is involved to try debugging this. Any help appreciated!
Dominique
This is a more general problem. The data messages passed to hobbitd_rrd are truncated.
Debugging showed that messages are going correctly out of hobbitd but read incorrectly by hobbitd_channel.
Here below the debug output of hobbitd and hobbitd_channel with extra printf lines to dump the messages.
------ hobbitd.log -------- 2009-04-17 16:22:21 <- do_message/1 2009-04-17 16:22:21 -> do_message/1 (86 bytes): data blind.ifstat 2009-04-17 16:22:21 -> update_statistics 2009-04-17 16:22:21 <- update_statistics 2009-04-17 16:22:21 -> oksender 2009-04-17 16:22:21 <- oksender(1-a) 2009-04-17 16:22:21 ->handle_data 2009-04-17 16:22:21 -> posttochannel 2009-04-17 16:22:21 Posting message 2 to 1 readers 2009-04-17 16:22:21 <- posttochannel 2009-04-17 16:22:21 <-handle_data 2009-04-17 16:22:21 msg: data blind.ifstat solaris bge:0:bge0:obytes64 267829127 bge:0:bge0:rbytes64 1208836563 2009-04-17 16:22:21 <- do_message/1 2009-04-17 16:22:21 -> do_message/1 (104 bytes): data blind.vmstat 2009-04-17 16:22:21 -> update_statistics 2009-04-17 16:22:21 <- update_statistics 2009-04-17 16:22:21 -> oksender 2009-04-17 16:22:21 <- oksender(1-a) 2009-04-17 16:22:21 ->handle_data 2009-04-17 16:22:21 -> posttochannel 2009-04-17 16:22:21 Posting message 3 to 1 readers 2009-04-17 16:22:21 <- posttochannel 2009-04-17 16:22:21 <-handle_data 2009-04-17 16:22:21 msg: data blind.vmstat solaris 0 0 0 11938312 10700752 3 19 0 0 0 0 0 2 2 2 0 343 2099 1006 1 2 97 2009-04-17 16:22:21 <- do_message/1 2009-04-17 16:22:21 -> do_message/1 (1315 bytes): data blind.iostatdisk
------- rrd-data.log -------- 2009-04-17 16:22:21 Peer not up, flushing message queue 2009-04-17 16:22:21 Connecting to peer 0.0.0.0:0 2009-04-17 16:22:21 Peer is UP 2009-04-17 16:22:21 inbuf: @@data#2/blind|1239978141.731166|130.223.27.23||blind|ifstat|sunos|intraDevServ,adminSys
data blind.ifstat solaris bge:0:bge0:obytes64 267829127 bge:0:bge0:rbytes64 12088365 @@
2009-04-17 16:22:21 inbuf: @@data#3/blind|1239978141.731938|130.223.27.23||blind|vmstat|sunos|intraDevServ,adminSys
data blind.vmstat solaris 0 0 0 11938312 10700752 3 19 0 0 0 0 0 2 2 2 0 343 2099 1006 1 2 @@
The last value of ifstat and vmstat (1208836563,97) becomes 12088365 and NULL respectively. Hope Henrick can help us to solve this issue.
Dominique
To unsubscribe from the hobbit list, send an e-mail to hobbit-unsubscribe at hswn.dk
Finally...found the issue in hobbitd.c Patch hobbitd.patch is attached. Installation ------------ Place in top Xymon install dir. and patch with: # patch -p0 < hobbitd.patch # gmake Copy hobbitd to your install bin dir. Dominique