Hi Matt,
The log lines you're seeing are actually from the new xymond process trying to start up, then failing because the port is already in use. I think the timeout right below it is from the previous process's signal handler giving up, based on the timestamps.
Can you get a backtrace from xymond's core file? It should be left in /var/lib/xymon/tmp/, or in the (*shudder*) systemd journal somewhere...
If your system is set not to keep them by default, add '' export DAEMON_COREFILE_LIMIT="unlimited" ulimit -c unlimited '' to /etc/sysconfig/xymonlaunch
I suspect there might be something corrupted in the xymond checkpoint file. First, do a 'service xymon stop' and make sure all xymon processes are completely gone, including any xymond's still pending, then start xymon back up. If it crashes again, do the same, but move the /var/lib/xymon/xymond.chk checkpoint file out of the way after it's off, and let it come back up.
If it *still* doesn't come up, there's something else going on. Either way, a full backtrace will help let us see where exactly it's dying.
HTH, -jc
On Sat, January 30, 2016 8:28 am, Matt Vander Werf wrote:
As a followup, xymond seems to try and start itself up again after a while (probably because xymonlaunch is still running) and goes for a short while working just fine and then just crashes again with the same messages and results.
-- Matt Vander Werf
On Sat, Jan 30, 2016 at 11:21 AM, Matt Vander Werf <matt1299 at gmail.com> wrote:
Hello,
I'm having a major issue with xymond crashing shortly after the service starts.
I'm using the the latest Terabithia RPM for RHEL 7 (4.3.24-3.el7.terabithia).
When I check the status of the xymon service, it shows it as up but with only the xymonlaunch parent process and vmstat processes. Upon restarting the service, I see it start normally (all the normal channel processes, etc.) and then after a while they all go away, leaving the following process behind:
ââ2760 xymon-signal 0.0.0.0 status+1d/group:signal<server hostname>.xymond red (Check time of report) - xymond program crashed Fatal signal caught!
along with the xymonlaunch process and some vmstat processes. After a while that process goes away. Sometimes a single xymond_rrd will show up alongside the xymonlaunch and vmstat processes as well after a little while.
I'm already running xymond in --debug mode.
This is what I see in the xymond log around the time of the crash:
2773 2016-01-30 11:02:32.515505 Status: Host=<host>, test=ntp 2773 2016-01-30 11:02:32.515507 -- create_hostlist_t for <host> (<client IP address>) 2773 2016-01-30 11:02:32.515513 Status: Host=<host>, test=conn 2773 2016-01-30 11:02:32.515520 Status: Host=<host>, test=raid 2773 2016-01-30 11:02:32.515529 Status: Host=<host>, test=memory 2773 2016-01-30 11:02:32.515534 Status: Host=<host>, test=files 2773 2016-01-30 11:02:32.515670 Status: Host=<host>, test=procs 2773 2016-01-30 11:02:32.515879 Status: Host=<host>, test=inode 2773 2016-01-30 11:02:32.515891 Status: Host=<host>, test=disk 2773 2016-01-30 11:02:32.516004 Status: Host=<host>, test=cpu 2773 2016-01-30 11:02:32.516605 Loaded 14419 status logs 2016-01-30 11:02:32 Setting up network listener on 0.0.0.0:1984 2016-01-30 11:02:32.516677 Cannot bind to listen socket (Address already in use) 2016-01-30 11:02:59.538906 Whoops ! Failed to send message (Timeout) 2016-01-30 11:02:59.539020 -> 2016-01-30 11:02:59.539023 -> Recipient '<server IP address>', timeout 50 2016-01-30 11:02:59.539024 -> 1st line: 'status+1d/group:signal <server hostname>.xymond red (Check time of report) - xymond program crashed'
It seems to get finished with loading all the hosts and then it crashes (the last host before it crashes is the last client I have alphabetically).
I've tried stopping the service, killing off any remaining xymon owned processes, and started the service with the same results. I've also tried restarting the xymon server machine itself, with the same crash happening when the service starts the first time.
This just started happening out of the blue a couple of hours ago...
Looking in netstat, there are no active connections using port 1984 on the local side, just a bunch of clients trying to connect to the server with 1984 in the foreign address.
ANY help would be much appreciated as currently our Xymon server is not working!!
Thanks!!
-- Matt Vander Werf
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon