Hi,
We track "surgemail" processes using following rule in hobbit-clients.cfg:
HOST=xyz PROC ./surgemail min=0 TRACK=surgemail
The ps listing in msg.xyz.txt reports 315 "./surgemail" processes, while the rrd graph only shows ~30 processes.
Here the last corresponding dataset of processes.surgemail.rrd file (after flushing the cache by stopping Xymon):
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE rrd SYSTEM "http://oss.oetiker.ch/rrdtool/rrdtool.dtd"> <!-- Round Robin Database Dump --><rrd> <version> 0003 </version> <step> 300 </step> <!-- Seconds --> <lastupdate> 1239775972 </lastupdate> <!-- 2009-04-15 08:12:52 CEST -->
<ds>
<name> count </name>
<type> GAUGE </type>
<minimal_heartbeat> 600 </minimal_heartbeat>
<min> 0.0000000000e+00 </min>
<max> NaN </max>
<!-- PDP Status -->
<last_ds> 30 </last_ds>
<value> 5.1600000000e+03 </value>
<unknown_sec> 0 </unknown_sec>
</ds>
<!-- Round Robin Archives --> <rra>
We tried to let Xymon recreate a fresh rrd without success. The same configuration was working with Hobbit-4.2.0/RRDtool 1.2.19 (same version)
The rrd-code has pretty changed since 4.2.0 and I don't really see what code is involved to try debugging this. Any help appreciated!
Dominique