Xymon client only reports once
I've installed Xymon on my home network as testing for a possible installation at work. It's working fine on three out of the four systems, but on the fourth, the client only reports its status to the server once, immediately after being started.
The problem computer is a Pentium MMX with 48MB RAM, running Gentoo Linux.
It looks as if the client is getting hung in the process of sending the second report: "ps aux" shows a sleeping "xymonlaunch" process, and the XYMONTMP directory contains a "xymon_vmstat" file with a timestamp five minutes after the successful update.
I could probably work around this with cron job to restart the client every five minutes, but I'd rather fix it properly. Any suggestions on what might be going wrong, or other things I could look at?
Thanks, Mark Wagner
On Mon, February 13, 2012 8:13 pm, Mark wrote:
I've installed Xymon on my home network as testing for a possible installation at work. It's working fine on three out of the four systems, but on the fourth, the client only reports its status to the server once, immediately after being started.
The problem computer is a Pentium MMX with 48MB RAM, running Gentoo Linux.
It looks as if the client is getting hung in the process of sending the second report: "ps aux" shows a sleeping "xymonlaunch" process, and the XYMONTMP directory contains a "xymon_vmstat" file with a timestamp five minutes after the successful update.
I could probably work around this with cron job to restart the client every five minutes, but I'd rather fix it properly. Any suggestions on what might be going wrong, or other things I could look at?
Thanks, Mark Wagner
The vmstat file there sounds normal... Can you run xymonlaunch with --debug and see what it's reporting back? Also, strace what it's doing when the next expected run occurs?
For testing purposes you can bring the interval down to 30s or so. The only change you should notice is having multiple backgrounded vmstat processes going at once in a round-robin fashion.
-jc
On Tuesday 14 February 2012 11:59:13 am you wrote:
On Mon, February 13, 2012 8:13 pm, Mark wrote:
I've installed Xymon on my home network as testing for a possible installation at work. It's working fine on three out of the four systems, but on the fourth, the client only reports its status to the server once, immediately after being started.
The problem computer is a Pentium MMX with 48MB RAM, running Gentoo Linux.
It looks as if the client is getting hung in the process of sending the second report: "ps aux" shows a sleeping "xymonlaunch" process, and the XYMONTMP directory contains a "xymon_vmstat" file with a timestamp five minutes after the successful update.
I could probably work around this with cron job to restart the client every five minutes, but I'd rather fix it properly. Any suggestions on what might be going wrong, or other things I could look at?
Thanks, Mark Wagner
The vmstat file there sounds normal... Can you run xymonlaunch with --debug and see what it's reporting back? Also, strace what it's doing when the next expected run occurs?
For testing purposes you can bring the interval down to 30s or so. The only change you should notice is having multiple backgrounded vmstat processes going at once in a round-robin fashion.
Running xymonlaunch from the command line with the "--no-daemon" and "--debug" options, there's no output to the terminal.
clientlaunch.log: 2012-02-14 21:48:23 xymonlaunch starting 2012-02-14 21:48:23 Loading tasklist configuration from ./etc/clientlaunch.cfg 15337 2012-02-14 21:48:23 Opening file ./etc/clientlaunch.cfg 15337 2012-02-14 21:48:23 15337 2012-02-14 21:48:23 Starting tasklist scan 15337 2012-02-14 21:48:23 About to start task client 15338 2012-02-14 21:48:23 client -> Loading environment from /home/xymon/client/etc/xymonclient.cfg area 15338 2012-02-14 21:48:23 Opening file /home/xymon/client/etc/xymonclient.cfg 15338 2012-02-14 21:48:23 client -> Assigning stdout/stderr to log '/home/xymon/client/logs/xymonclient.log' 15337 2012-02-14 21:48:28 15337 2012-02-14 21:48:28 Starting tasklist scan 15337 2012-02-14 21:48:28 Task client active with PID 15338 15337 2012-02-14 21:48:32 15337 2012-02-14 21:48:32 Starting tasklist scan
The last two lines then repeat every five seconds until I kill the client.
xymonclient.log: 15338 2012-02-14 21:48:23 client -> Running '/home/xymon/client/bin/xymonclient.sh', XYMONHOME=/home/xymon/client
That one line is the only entry.
strace shows xymonclient forking off a new process (the "About to start task client" entry in clientlaunch.log). The task client then execs xymonclient.sh, which gathers data, sends it off, and exits. The main thread, meanwhile, has the following strace output repeating every five seconds with suitable changes to timestamps:
15337 21:56:03 wait4(-1, 0xbffff41c, WNOHANG, NULL) = -1 ECHILD (No child processes) 15337 21:56:03 time(NULL) = 1329285363 15337 21:56:03 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0 15337 21:56:03 getpid() = 15337 15337 21:56:03 write(1, "15337 2012-02-14 21:56:03 \n", 27) = 27 15337 21:56:03 time(NULL) = 1329285363 15337 21:56:03 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0 15337 21:56:03 getpid() = 15337 15337 21:56:03 write(1, "15337 2012-02-14 21:56:03 Starting tasklist scan\n", 49) = 49 15337 21:56:03 time(NULL) = 1329285363 15337 21:56:03 rt_sigprocmask(SIG_BLOCK, [CHLD], [RTMIN], 8) = 0 15337 21:56:03 rt_sigaction(SIGCHLD, NULL, {0x804a950, [], SA_RESTORER, 0x4005d6f8}, 8) = 0 15337 21:56:03 rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0 15337 21:56:03 nanosleep({5, 0}, 0xbffff224) = 0
There's no change at the 30-second mark (when the client task should be gathering the next set of data), and the only action at the five-minute mark is vmstat waking up.
For comparison, running strace on a working system shows the main thread creating a new task client process right when it should. One thing that may or may not be relevant: although the log output on both systems has the entry "Starting tasklist scan", the working client doesn't actually start stat()-ing "clientlaunch.cfg" until after the *second* successful run of the task client; the non-working system never does stat() it.
The strace logs from both machines are available if anyone thinks they might be useful in figuring out what's happening, but since they're about 2.5MB combined, I don't want to send them to the whole list.
-- Mark Wagner
participants (2)
-
cleaver@terabithia.org
-
mark@carnildo.com