Wim
I suspect the solution is to find and fix the cause of the buffer overflow. Is there a coredump from which you can get a backtrace?
Another fix might be to put msgcache/xymonfetch into the mix. The msgcache process queues up queries and delivers them when it can.
Cheers Jeremy
On 23 February 2018 at 02:31, Wim Nelis <wim.nelis at ziggo.nl> wrote:
On a Raspberry Pi zero W xymon-client is running to monitor some sensors and the PRi0 itself. As the xymon server itself is not reachable by times, a minimal xymon-server is running too on the RPi0. Xymonproxy is used to distribute the status and data messages to both the local xymon server and the primary xymon server. The intention of this setup is to have all the RRD's locally complete at the RPi0. If needed, the RRDs can be copied from the RPi0 to the primary xymon server.
The local xymon server is listening to port 1985, the xymonproxy to port 1984. The latter distributes the messages to two servers, using parameter "--server=127.0.0.1:1985,192.168.178.72:1984". This setup is working, but the graphs created from the local RRD's contain gaps in the periods that the primary xymon server is not reachable. The logfiles of the clients running on the RPi0 contain messages like the following, about twice an hour:
2018-02-21 06:10:01.664965 Whoops ! Failed to send message (Connection failed) 2018-02-21 06:10:01.665673 -> Could not connect to Xymon daemon at 127.0.0.1:1984 (Connection refused) 2018-02-21 06:10:01.665767 -> Recipient '127.0.0.1', timeout 15 2018-02-21 06:10:01.665851 -> 1st line: 'status rpi00.mve green Wed 2018.02.21 06:10:01'
This does explain the gaps. The logfile of xymonproxy shows that the proxy is restarted a dozen times per hour:
2018-02-21 05:55:38.272757 xymonproxy version 4.3.28 starting 2018-02-21 05:55:38.273605 Listening on 0.0.0.0:1984 2018-02-21 05:55:38.273751 Sending to Xymon server(s) 127.0.0.1:1985 192.168.178.72:1984 2018-02-21 05:56:05.304985 Server not responding, message lost 2018-02-21 06:00:30.195973 Server not responding, message lost 2018-02-21 06:00:36.221908 Server not responding, message lost 2018-02-21 06:00:41.231668 Server not responding, message lost 2018-02-21 06:00:41.236076 Server not responding, message lost *** buffer overflow detected ***: /usr/lib/xymon/server/bin/xymonproxy terminated 2018-02-21 06:00:42.269357 xymonproxy version 4.3.28 starting 2018-02-21 06:00:42.270200 Listening on 0.0.0.0:1984 2018-02-21 06:00:42.270346 Sending to Xymon server(s) 127.0.0.1:1985 192.168.178.72:1984 2018-02-21 06:01:09.301618 Server not responding, message lost 2018-02-21 06:05:29.188224 Server not responding, message lost 2018-02-21 06:05:40.201194 Server not responding, message lost 2018-02-21 06:05:45.208531 Server not responding, message lost 2018-02-21 06:05:45.208936 Server not responding, message lost 2018-02-21 06:05:45.209058 Server not responding, message lost *** buffer overflow detected ***: /usr/lib/xymon/server/bin/xymonproxy terminated 2018-02-21 06:10:45.237707 xymonproxy version 4.3.28 starting 2018-02-21 06:10:45.239061 Listening on 0.0.0.0:1984 2018-02-21 06:10:45.239219 Sending to Xymon server(s) 127.0.0.1:1985 192.168.178.72:1984 2018-02-21 06:11:11.272425 Server not responding, message lost
I have been playing with the queue length, but to no avail. Is it possible to have xymonproxy not to terminate every 5 minutes, but just report the inability to send a message to a particular server?
Regards, Wim Nelis.
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon