That's odd. If you're on a box with a lot of memory, writing out to a tmpfs might help. For your worker, I'd suggest just adding a debug line or two in front of that section.
WRT the checkpoint file, the only real corruption I've seen myself has occurred when malformed utf-8 packets came in -- I'd accidentally included gzip output in a script I'd put in my /local directory :/.
You could try modifying the init startup/shutdown script to copy over the checkpoint file every once in a while, and then point a copy of xymond over to it in --debug mode and see if it chokes... and if so, how far in.
Thinking about it, a --validate flag to xymond might not be too hard to whip up.
Regards,
-jc
--- Original Message ---
Heh , I'd have to look at the whole stachg channel to find needle in haystack for that
Got a couple (once every 2-3 day) core dumps here:
Program terminated with signal 11, Segmentation fault. #0 main (argc=2, argv=0xbfd1a444) at xymond_mysql.c:371
xymond_mysql.c line 371: mysql_escape_string(timestamp,metadata[1],timestampbytes); Timestampbytes is strln of timestamp
I am not strong in C , however, so to find that needle, I wrote a perl version that pipes hist to mysql (that way, it logs exceptions etc etc), However, the perl version can't handle the rate of messages (between 300-500/sec)
Bleh
What I STRONGLY need help with is my xymond.chk getting corrupted - henrik looked at one a while back, and gave me something to look at/fix Which I did, but it's still getting corrupted (and then any time it crashes, lose all states)
Do you know of a good way to parse/manage the chk file to see what it doesn't like?