Heh , I'd have to look at the whole stachg channel to find needle in haystack for that
Got a couple (once every 2-3 day) core dumps here:
Program terminated with signal 11, Segmentation fault. #0 main (argc=2, argv=0xbfd1a444) at xymond_mysql.c:371
xymond_mysql.c line 371: mysql_escape_string(timestamp,metadata[1],timestampbytes); Timestampbytes is strln of timestamp
I am not strong in C , however, so to find that needle, I wrote a perl version that pipes hist to mysql (that way, it logs exceptions etc etc), However, the perl version can't handle the rate of messages (between 300-500/sec)
Bleh
What I STRONGLY need help with is my xymond.chk getting corrupted - henrik looked at one a while back, and gave me something to look at/fix Which I did, but it's still getting corrupted (and then any time it crashes, lose all states)
Do you know of a good way to parse/manage the chk file to see what it doesn't like?
On 3/15/13 1:19 PM, "cleaver at terabithia.org" <cleaver at terabithia.org> wrote:
Yeah, that generally means your pipe has backed up too much.
"Rate of messages" is a good metric to keep track of (visible at 5m intervals from the xymond status report). If you're getting 3000 messages every 300 seconds, that's 0.1s you've got to process each message coming in on average, but subject to expected spikes and the buffers running over.
Depending on what you're doing, smoothing out how often you're getting messages to reduce spikes will help, as will filtering at xymond_channel if you're only interesting in a subset, along with (obviously) trying to make the message processor more efficient.
Eventually, it could lead to forking off the handling (if you can do it efficiently and have cores to spare), or using an async queue somewhere.
On the second part, that's interesting... Can you provide a sample msg with a null?
Regards,
-jc
--- Original Message ---
I'll answer that myself yes that means whatever is there can't process the channel fast enough
So, I'll have to go back to my older parser which is getting this:
Core was generated by `xymond_mysql --pidfile=/var/log/xymon/xymond_history.pid'. Program terminated with signal 11, Segmentation fault. #0 0x08049de1 in addnetpeer (peername=0x4f8ca0 "") at xymond_channel.c:140 140 xymond_channel.c: No such file or directory. in xymond_channel.c (gdb) where #0 0x08049de1 in addnetpeer (peername=0x4f8ca0 "") at xymond_channel.c:140 #1 0x00511e9c in ?? () #2 0x004f8ca0 in ?? () from /lib/ld-linux.so.2 #3 0x08057190 in stackfgets (buffer=0x80497b0, extraincl=0x2 <Address 0x2 out of bounds>) at stackio.c:434 #4 0x080496c1 in _start ()
Which is getting a null timestamp for some items on stachg channel :/
From: <Clark>, Sean Clark <sean.clark at twcable.com> Date: Friday, March 15, 2013 11:21 AM To: "xymon at xymon.com" <xymon at xymon.com> Subject: [Xymon] Flushing Stale messages?
I have a channel parser than looks at items in the 'stachg' channel
It looks like it's working for me (it parses and does stuff properly)
However my log is filling up with this:
2013-03-15 11:08:29 Flushed 4 stale messages for 0.0.0.0:0 2013-03-15 11:08:30 Flushed 4 stale messages for 0.0.0.0:0 2013-03-15 11:08:31 Flushed 3 stale messages for 0.0.0.0:0 2013-03-15 11:08:32 Flushed 6 stale messages for 0.0.0.0:0 2013-03-15 11:08:33 Flushed 2 stale messages for 0.0.0.0:0 2013-03-15 11:08:34 Flushed 2 stale messages for 0.0.0.0:0 2013-03-15 11:08:35 Flushed 3 stale messages for 0.0.0.0:0 2013-03-15 11:08:36 Flushed 3 stale messages for 0.0.0.0:0 2013-03-15 11:08:37 Flushed 4 stale messages for 0.0.0.0:0 2013-03-15 11:08:38 Flushed 4 stale messages for 0.0.0.0:0
Is this telling my my parse can not handle the channel in a timely manner, and the message is growing "stale" and I am droping things?
-Sean
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.