I have a channel parser than looks at items in the 'stachg' channel
It looks like it's working for me (it parses and does stuff properly)
However – my log is filling up with this:
2013-03-15 11:08:29 Flushed 4 stale messages for 0.0.0.0:0 2013-03-15 11:08:30 Flushed 4 stale messages for 0.0.0.0:0 2013-03-15 11:08:31 Flushed 3 stale messages for 0.0.0.0:0 2013-03-15 11:08:32 Flushed 6 stale messages for 0.0.0.0:0 2013-03-15 11:08:33 Flushed 2 stale messages for 0.0.0.0:0 2013-03-15 11:08:34 Flushed 2 stale messages for 0.0.0.0:0 2013-03-15 11:08:35 Flushed 3 stale messages for 0.0.0.0:0 2013-03-15 11:08:36 Flushed 3 stale messages for 0.0.0.0:0 2013-03-15 11:08:37 Flushed 4 stale messages for 0.0.0.0:0 2013-03-15 11:08:38 Flushed 4 stale messages for 0.0.0.0:0
Is this telling my my parse can not handle the channel in a timely manner, and the message is growing "stale" and I am droping things?
-Sean
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
I'll answer that myself – yes that means whatever is there can't process the channel fast enough
So, I'll have to go back to my older parser – which is getting this:
Core was generated by `xymond_mysql --pidfile=/var/log/xymon/xymond_history.pid'. Program terminated with signal 11, Segmentation fault. #0 0x08049de1 in addnetpeer (peername=0x4f8ca0 "") at xymond_channel.c:140 140 xymond_channel.c: No such file or directory. in xymond_channel.c (gdb) where #0 0x08049de1 in addnetpeer (peername=0x4f8ca0 "") at xymond_channel.c:140 #1 0x00511e9c in ?? () #2 0x004f8ca0 in ?? () from /lib/ld-linux.so.2 #3 0x08057190 in stackfgets (buffer=0x80497b0, extraincl=0x2 <Address 0x2 out of bounds>) at stackio.c:434 #4 0x080496c1 in _start ()
Which is getting a null timestamp for some items on stachg channel :/
From: <Clark>, Sean Clark <sean.clark at twcable.com<mailto:sean.clark at twcable.com>> Date: Friday, March 15, 2013 11:21 AM To: "xymon at xymon.com<mailto:xymon at xymon.com>" <xymon at xymon.com<mailto:xymon at xymon.com>> Subject: [Xymon] Flushing Stale messages?
I have a channel parser than looks at items in the 'stachg' channel
It looks like it's working for me (it parses and does stuff properly)
However – my log is filling up with this:
2013-03-15 11:08:29 Flushed 4 stale messages for 0.0.0.0:0 2013-03-15 11:08:30 Flushed 4 stale messages for 0.0.0.0:0 2013-03-15 11:08:31 Flushed 3 stale messages for 0.0.0.0:0 2013-03-15 11:08:32 Flushed 6 stale messages for 0.0.0.0:0 2013-03-15 11:08:33 Flushed 2 stale messages for 0.0.0.0:0 2013-03-15 11:08:34 Flushed 2 stale messages for 0.0.0.0:0 2013-03-15 11:08:35 Flushed 3 stale messages for 0.0.0.0:0 2013-03-15 11:08:36 Flushed 3 stale messages for 0.0.0.0:0 2013-03-15 11:08:37 Flushed 4 stale messages for 0.0.0.0:0 2013-03-15 11:08:38 Flushed 4 stale messages for 0.0.0.0:0
Is this telling my my parse can not handle the channel in a timely manner, and the message is growing "stale" and I am droping things?
-Sean
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
Yeah, that generally means your pipe has backed up too much.
"Rate of messages" is a good metric to keep track of (visible at 5m intervals from the xymond status report). If you're getting 3000 messages every 300 seconds, that's 0.1s you've got to process each message coming in on average, but subject to expected spikes and the buffers running over.
Depending on what you're doing, smoothing out how often you're getting messages to reduce spikes will help, as will filtering at xymond_channel if you're only interesting in a subset, along with (obviously) trying to make the message processor more efficient.
Eventually, it could lead to forking off the handling (if you can do it efficiently and have cores to spare), or using an async queue somewhere.
On the second part, that's interesting... Can you provide a sample msg with a null?
Regards,
-jc
--- Original Message ---
I'll answer that myself yes that means whatever is there can't process the channel fast enough
So, I'll have to go back to my older parser which is getting this:
Core was generated by `xymond_mysql --pidfile=/var/log/xymon/xymond_history.pid'. Program terminated with signal 11, Segmentation fault. #0 0x08049de1 in addnetpeer (peername=0x4f8ca0 "") at xymond_channel.c:140 140 xymond_channel.c: No such file or directory. in xymond_channel.c (gdb) where #0 0x08049de1 in addnetpeer (peername=0x4f8ca0 "") at xymond_channel.c:140 #1 0x00511e9c in ?? () #2 0x004f8ca0 in ?? () from /lib/ld-linux.so.2 #3 0x08057190 in stackfgets (buffer=0x80497b0, extraincl=0x2 <Address 0x2 out of bounds>) at stackio.c:434 #4 0x080496c1 in _start ()
Which is getting a null timestamp for some items on stachg channel :/
From: <Clark>, Sean Clark <sean.clark at twcable.com> Date: Friday, March 15, 2013 11:21 AM To: "xymon at xymon.com" <xymon at xymon.com> Subject: [Xymon] Flushing Stale messages?
I have a channel parser than looks at items in the 'stachg' channel
It looks like it's working for me (it parses and does stuff properly)
However my log is filling up with this:
2013-03-15 11:08:29 Flushed 4 stale messages for 0.0.0.0:0 2013-03-15 11:08:30 Flushed 4 stale messages for 0.0.0.0:0 2013-03-15 11:08:31 Flushed 3 stale messages for 0.0.0.0:0 2013-03-15 11:08:32 Flushed 6 stale messages for 0.0.0.0:0 2013-03-15 11:08:33 Flushed 2 stale messages for 0.0.0.0:0 2013-03-15 11:08:34 Flushed 2 stale messages for 0.0.0.0:0 2013-03-15 11:08:35 Flushed 3 stale messages for 0.0.0.0:0 2013-03-15 11:08:36 Flushed 3 stale messages for 0.0.0.0:0 2013-03-15 11:08:37 Flushed 4 stale messages for 0.0.0.0:0 2013-03-15 11:08:38 Flushed 4 stale messages for 0.0.0.0:0
Is this telling my my parse can not handle the channel in a timely manner, and the message is growing "stale" and I am droping things?
-Sean
Heh , I'd have to look at the whole stachg channel to find needle in haystack for that
Got a couple (once every 2-3 day) core dumps here:
Program terminated with signal 11, Segmentation fault. #0 main (argc=2, argv=0xbfd1a444) at xymond_mysql.c:371
xymond_mysql.c line 371: mysql_escape_string(timestamp,metadata[1],timestampbytes); Timestampbytes is strln of timestamp
I am not strong in C , however, so to find that needle, I wrote a perl version that pipes hist to mysql (that way, it logs exceptions etc etc), However, the perl version can't handle the rate of messages (between 300-500/sec)
Bleh
What I STRONGLY need help with is my xymond.chk getting corrupted - henrik looked at one a while back, and gave me something to look at/fix Which I did, but it's still getting corrupted (and then any time it crashes, lose all states)
Do you know of a good way to parse/manage the chk file to see what it doesn't like?
On 3/15/13 1:19 PM, "cleaver at terabithia.org" <cleaver at terabithia.org> wrote:
Yeah, that generally means your pipe has backed up too much.
"Rate of messages" is a good metric to keep track of (visible at 5m intervals from the xymond status report). If you're getting 3000 messages every 300 seconds, that's 0.1s you've got to process each message coming in on average, but subject to expected spikes and the buffers running over.
Depending on what you're doing, smoothing out how often you're getting messages to reduce spikes will help, as will filtering at xymond_channel if you're only interesting in a subset, along with (obviously) trying to make the message processor more efficient.
Eventually, it could lead to forking off the handling (if you can do it efficiently and have cores to spare), or using an async queue somewhere.
On the second part, that's interesting... Can you provide a sample msg with a null?
Regards,
-jc
--- Original Message ---
I'll answer that myself yes that means whatever is there can't process the channel fast enough
So, I'll have to go back to my older parser which is getting this:
Core was generated by `xymond_mysql --pidfile=/var/log/xymon/xymond_history.pid'. Program terminated with signal 11, Segmentation fault. #0 0x08049de1 in addnetpeer (peername=0x4f8ca0 "") at xymond_channel.c:140 140 xymond_channel.c: No such file or directory. in xymond_channel.c (gdb) where #0 0x08049de1 in addnetpeer (peername=0x4f8ca0 "") at xymond_channel.c:140 #1 0x00511e9c in ?? () #2 0x004f8ca0 in ?? () from /lib/ld-linux.so.2 #3 0x08057190 in stackfgets (buffer=0x80497b0, extraincl=0x2 <Address 0x2 out of bounds>) at stackio.c:434 #4 0x080496c1 in _start ()
Which is getting a null timestamp for some items on stachg channel :/
From: <Clark>, Sean Clark <sean.clark at twcable.com> Date: Friday, March 15, 2013 11:21 AM To: "xymon at xymon.com" <xymon at xymon.com> Subject: [Xymon] Flushing Stale messages?
I have a channel parser than looks at items in the 'stachg' channel
It looks like it's working for me (it parses and does stuff properly)
However my log is filling up with this:
2013-03-15 11:08:29 Flushed 4 stale messages for 0.0.0.0:0 2013-03-15 11:08:30 Flushed 4 stale messages for 0.0.0.0:0 2013-03-15 11:08:31 Flushed 3 stale messages for 0.0.0.0:0 2013-03-15 11:08:32 Flushed 6 stale messages for 0.0.0.0:0 2013-03-15 11:08:33 Flushed 2 stale messages for 0.0.0.0:0 2013-03-15 11:08:34 Flushed 2 stale messages for 0.0.0.0:0 2013-03-15 11:08:35 Flushed 3 stale messages for 0.0.0.0:0 2013-03-15 11:08:36 Flushed 3 stale messages for 0.0.0.0:0 2013-03-15 11:08:37 Flushed 4 stale messages for 0.0.0.0:0 2013-03-15 11:08:38 Flushed 4 stale messages for 0.0.0.0:0
Is this telling my my parse can not handle the channel in a timely manner, and the message is growing "stale" and I am droping things?
-Sean
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
That's odd. If you're on a box with a lot of memory, writing out to a tmpfs might help. For your worker, I'd suggest just adding a debug line or two in front of that section.
WRT the checkpoint file, the only real corruption I've seen myself has occurred when malformed utf-8 packets came in -- I'd accidentally included gzip output in a script I'd put in my /local directory :/.
You could try modifying the init startup/shutdown script to copy over the checkpoint file every once in a while, and then point a copy of xymond over to it in --debug mode and see if it chokes... and if so, how far in.
Thinking about it, a --validate flag to xymond might not be too hard to whip up.
Regards,
-jc
--- Original Message ---
Heh , I'd have to look at the whole stachg channel to find needle in haystack for that
Got a couple (once every 2-3 day) core dumps here:
Program terminated with signal 11, Segmentation fault. #0 main (argc=2, argv=0xbfd1a444) at xymond_mysql.c:371
xymond_mysql.c line 371: mysql_escape_string(timestamp,metadata[1],timestampbytes); Timestampbytes is strln of timestamp
I am not strong in C , however, so to find that needle, I wrote a perl version that pipes hist to mysql (that way, it logs exceptions etc etc), However, the perl version can't handle the rate of messages (between 300-500/sec)
Bleh
What I STRONGLY need help with is my xymond.chk getting corrupted - henrik looked at one a while back, and gave me something to look at/fix Which I did, but it's still getting corrupted (and then any time it crashes, lose all states)
Do you know of a good way to parse/manage the chk file to see what it doesn't like?
Just as a note of perl vs straight C code
Using mysql libs & C to insert stachg channel -- handles about 1200 msgs/5 minutes before it starts flushing on a dual core machine with 8 GB RAM Same hardware using Perl, DBD:Mysql -- tops out @ about 300
/sw/xymon/server/bin/xymond --listen=127.0.0.1:1985 --debug --checkpoint-file=./xymond.chk.crashed
As to the debug loading of chk file:
31911 2013-03-15 15:23:17 Opening file /sw/xymon/server/etc/hosts.cfg 31911 2013-03-15 15:23:19 Opening file /sw/xymon/server/etc/client-local.cfg 2013-03-15 15:23:19 Setting up network listener on 127.0.0.1:1985 2013-03-15 15:23:19 Setting up signal handlers 2013-03-15 15:23:19 Setting up xymond channels 31911 2013-03-15 15:23:19 Setting up status channel (id=1) 31911 2013-03-15 15:23:19 calling ftok('/sw/xymon/server',1) 31911 2013-03-15 15:23:19 ftok() returns: 0x1000047 31911 2013-03-15 15:23:19 shmget() returns: 0xD6800C 2013-03-15 15:23:19 FATAL: xymond sees clientcount 1, should be 0 Check for hanging xymond_channel processes or stale semaphores 2013-03-15 15:23:19 Cannot setup status channel
That is telling me
On 3/15/13 2:41 PM, "cleaver at terabithia.org" <cleaver at terabithia.org> wrote:
That's odd. If you're on a box with a lot of memory, writing out to a tmpfs might help. For your worker, I'd suggest just adding a debug line or two in front of that section.
WRT the checkpoint file, the only real corruption I've seen myself has occurred when malformed utf-8 packets came in -- I'd accidentally included gzip output in a script I'd put in my /local directory :/.
You could try modifying the init startup/shutdown script to copy over the checkpoint file every once in a while, and then point a copy of xymond over to it in --debug mode and see if it chokes... and if so, how far in.
Thinking about it, a --validate flag to xymond might not be too hard to whip up.
Regards,
-jc
--- Original Message ---
Heh , I'd have to look at the whole stachg channel to find needle in haystack for that
Got a couple (once every 2-3 day) core dumps here:
Program terminated with signal 11, Segmentation fault. #0 main (argc=2, argv=0xbfd1a444) at xymond_mysql.c:371
xymond_mysql.c line 371: mysql_escape_string(timestamp,metadata[1],timestampbytes); Timestampbytes is strln of timestamp
I am not strong in C , however, so to find that needle, I wrote a perl version that pipes hist to mysql (that way, it logs exceptions etc etc), However, the perl version can't handle the rate of messages (between 300-500/sec)
Bleh
What I STRONGLY need help with is my xymond.chk getting corrupted - henrik looked at one a while back, and gave me something to look at/fix Which I did, but it's still getting corrupted (and then any time it crashes, lose all states)
Do you know of a good way to parse/manage the chk file to see what it doesn't like?
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
Whoops sent this before I finished typing
This is telling me some xymond_channel isn't exiting properly and it can't load? It's not telling me much about invalid data for hosts (which is where henrik pointed me back in the day)
On 3/15/13 3:31 PM, "Clark, Sean" <sean.clark at twcable.com> wrote:
Just as a note of perl vs straight C code
Using mysql libs & C to insert stachg channel -- handles about 1200 msgs/5 minutes before it starts flushing on a dual core machine with 8 GB RAM Same hardware using Perl, DBD:Mysql -- tops out @ about 300
/sw/xymon/server/bin/xymond --listen=127.0.0.1:1985 --debug --checkpoint-file=./xymond.chk.crashed
As to the debug loading of chk file:
31911 2013-03-15 15:23:17 Opening file /sw/xymon/server/etc/hosts.cfg 31911 2013-03-15 15:23:19 Opening file /sw/xymon/server/etc/client-local.cfg 2013-03-15 15:23:19 Setting up network listener on 127.0.0.1:1985 2013-03-15 15:23:19 Setting up signal handlers 2013-03-15 15:23:19 Setting up xymond channels 31911 2013-03-15 15:23:19 Setting up status channel (id=1) 31911 2013-03-15 15:23:19 calling ftok('/sw/xymon/server',1) 31911 2013-03-15 15:23:19 ftok() returns: 0x1000047 31911 2013-03-15 15:23:19 shmget() returns: 0xD6800C 2013-03-15 15:23:19 FATAL: xymond sees clientcount 1, should be 0 Check for hanging xymond_channel processes or stale semaphores 2013-03-15 15:23:19 Cannot setup status channel
That is telling me
On 3/15/13 2:41 PM, "cleaver at terabithia.org" <cleaver at terabithia.org> wrote:
That's odd. If you're on a box with a lot of memory, writing out to a tmpfs might help. For your worker, I'd suggest just adding a debug line or two in front of that section.
WRT the checkpoint file, the only real corruption I've seen myself has occurred when malformed utf-8 packets came in -- I'd accidentally included gzip output in a script I'd put in my /local directory :/.
You could try modifying the init startup/shutdown script to copy over the checkpoint file every once in a while, and then point a copy of xymond over to it in --debug mode and see if it chokes... and if so, how far in.
Thinking about it, a --validate flag to xymond might not be too hard to whip up.
Regards,
-jc
--- Original Message ---
Heh , I'd have to look at the whole stachg channel to find needle in haystack for that
Got a couple (once every 2-3 day) core dumps here:
Program terminated with signal 11, Segmentation fault. #0 main (argc=2, argv=0xbfd1a444) at xymond_mysql.c:371
xymond_mysql.c line 371: mysql_escape_string(timestamp,metadata[1],timestampbytes); Timestampbytes is strln of timestamp
I am not strong in C , however, so to find that needle, I wrote a perl version that pipes hist to mysql (that way, it logs exceptions etc etc), However, the perl version can't handle the rate of messages (between 300-500/sec)
Bleh
What I STRONGLY need help with is my xymond.chk getting corrupted - henrik looked at one a while back, and gave me something to look at/fix Which I did, but it's still getting corrupted (and then any time it crashes, lose all states)
Do you know of a good way to parse/manage the chk file to see what it doesn't like?
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
On 15-03-2013 20:31, Clark, Sean wrote:
As to the debug loading of chk file:
31911 2013-03-15 15:23:17 Opening file /sw/xymon/server/etc/hosts.cfg 31911 2013-03-15 15:23:19 Opening file /sw/xymon/server/etc/client-local.cfg 2013-03-15 15:23:19 Setting up network listener on 127.0.0.1:1985 2013-03-15 15:23:19 Setting up signal handlers 2013-03-15 15:23:19 Setting up xymond channels 31911 2013-03-15 15:23:19 Setting up status channel (id=1) 31911 2013-03-15 15:23:19 calling ftok('/sw/xymon/server',1) 31911 2013-03-15 15:23:19 ftok() returns: 0x1000047 31911 2013-03-15 15:23:19 shmget() returns: 0xD6800C 2013-03-15 15:23:19 FATAL: xymond sees clientcount 1, should be 0 Check for hanging xymond_channel processes or stale semaphores 2013-03-15 15:23:19 Cannot setup status channel
This happens when xymond has crashed and is restarting, but either some of the old xymond_channel messages are still running (hanging on to a shared memory segment or a semaphore), or the shared memory segments were not cleaned up after the crash.
You can check with ipcs (as the xymon user) if there are any shared memory segments lying around after all of the xymon tasks have exited.
I have a script to cleanup everything and restart Xymon - writing new code may on rare occasions mean that xymond crashes :-) - feel free to try this. If you're not on a Linux box, make sure the "ipcs -m" and "ipcs -s" output has the shmid / semid in column 2. If not, adjust the 'awk' command to grab the correct column.
#!/bin/sh
if [ id -u != id -u xymon ]
then
echo "You must be the 'xymon' user to run this."
exit 1
fi
echo "Stopping Xymon"
~xymon/server/xymon.sh stop
sleep 2
if [ -f /var/run/xymon/xymond.pid ]
then
echo "Forcing kill of xymon process, PID cat /var/run/xymon/xymond.pid"
kill -9 cat /var/run/xymon/xymond.pid
fi
echo "Cleaning up shared memory segments" ipcs -s|grep "^0"|awk '{print $2}'|while read ID; do ipcrm -s $ID; done echo "Cleaning up semaphores" ipcs -m|grep "^0"|awk '{print $2}'|while read ID; do ipcrm -m $ID; done echo "Cleaning up socket files" rm ~xymon/server/tmp/xymond_if
echo "Starting Xymon" ~xymon/server/xymon.sh start
echo "Done" exit 0
Regards, Henrik
Thanks, I will try it out.
Why does removing the chk file seem to fix it? Does it try to establish old connections?
On 3/19/13 6:02 AM, "Henrik Størner" <henrik at hswn.dk> wrote:
On 15-03-2013 20:31, Clark, Sean wrote:
As to the debug loading of chk file:
31911 2013-03-15 15:23:17 Opening file /sw/xymon/server/etc/hosts.cfg 31911 2013-03-15 15:23:19 Opening file /sw/xymon/server/etc/client-local.cfg 2013-03-15 15:23:19 Setting up network listener on 127.0.0.1:1985 2013-03-15 15:23:19 Setting up signal handlers 2013-03-15 15:23:19 Setting up xymond channels 31911 2013-03-15 15:23:19 Setting up status channel (id=1) 31911 2013-03-15 15:23:19 calling ftok('/sw/xymon/server',1) 31911 2013-03-15 15:23:19 ftok() returns: 0x1000047 31911 2013-03-15 15:23:19 shmget() returns: 0xD6800C 2013-03-15 15:23:19 FATAL: xymond sees clientcount 1, should be 0 Check for hanging xymond_channel processes or stale semaphores 2013-03-15 15:23:19 Cannot setup status channel
This happens when xymond has crashed and is restarting, but either some of the old xymond_channel messages are still running (hanging on to a shared memory segment or a semaphore), or the shared memory segments were not cleaned up after the crash.
You can check with ipcs (as the xymon user) if there are any shared memory segments lying around after all of the xymon tasks have exited.
I have a script to cleanup everything and restart Xymon - writing new code may on rare occasions mean that xymond crashes :-) - feel free to try this. If you're not on a Linux box, make sure the "ipcs -m" and "ipcs -s" output has the shmid / semid in column 2. If not, adjust the 'awk' command to grab the correct column.
#!/bin/sh
if [
id -u!=id -u xymon] then echo "You must be the 'xymon' user to run this." exit 1 fiecho "Stopping Xymon" ~xymon/server/xymon.sh stop sleep 2 if [ -f /var/run/xymon/xymond.pid ] then echo "Forcing kill of xymon process, PID
cat /var/run/xymon/xymond.pid" kill -9cat /var/run/xymon/xymond.pidfiecho "Cleaning up shared memory segments" ipcs -s|grep "^0"|awk '{print $2}'|while read ID; do ipcrm -s $ID; done echo "Cleaning up semaphores" ipcs -m|grep "^0"|awk '{print $2}'|while read ID; do ipcrm -m $ID; done echo "Cleaning up socket files" rm ~xymon/server/tmp/xymond_if
echo "Starting Xymon" ~xymon/server/xymon.sh start
echo "Done" exit 0
Regards, Henrik
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
participants (3)
-
cleaver@terabithia.org
-
henrik@hswn.dk
-
sean.clark@twcable.com