Hi all,
I've got a strange problem that I'm trying to diagnose and would appreciate any help you can give.
We have 2 new servers that have recently been set up that are Aix servers running the hobbit client. We have 62 other Aix server with the same client running absolutely fine.
The problem is that the client data is getting cut off mid stream. It's always in the ps output. I've checked the MAX settings and there all ok, in fact we have other clients that are sending data files larger than these that are working fine. I've checked the data on the client and it's complete but if I look in /xymon/data/hostdata on the server the data seems to be almost always getting truncated to 69518 bytes. Occasionally a full message (approx 93k) gets through.
There are no messages regarding truncated data in the server logs and the only message I can find on the client is the following,
2013-02-26 08:41:21 Write error while sending message to bbd at xymonserver:1984
2013-02-26 08:41:21 Whoops ! bb failed to send message - write error
I've googled this extensively and can't find anything that seems relevant to our problem.
Regards,
Neil Simmonds
Senior Operations Analyst (Operations Support Group) Express Gifts Limited
Express House
Clayton Business Park
Accrington
Lancashire
BB5 5JY T: 01254 303092 | E: neil.simmonds at Express-Gifts.co.uk
Name & Registered Office: EXPRESS GIFTS LIMITED, 2 GREGORY ST, HYDE, CHESHIRE, ENGLAND, SK14 4TH, Company No. 00718151. Express Gifts Limited is authorised and regulated by the Financial Services Authority
NOTE: This email and any information contained within or attached in a separate file is confidential and intended solely for the
Individual to whom it is addressed. The information or data included is solely for the purpose indicated or previously agreed. Any
information or data included with this e-mail remains the property of Findel PLC and the recipient will refrain from utilising the
information for any purpose other than that indicated and upon request will destroy the information and remove it from their records.
Any views or opinions presented are solely those of the author and do not necessarily represent those of Findel PLC. If you are not
the intended recipient, be advised that you have received this email in error and that any use, dissemination, forwarding, printing,
or copying of this email is strictly prohibited. No warranties or assurances are made in relation to the safety and content of this
e-mail and any attachments. No liability is accepted for any consequences arising from it. Findel Plc reserves the right to monitor
all e-mail communications through its internal and external networks. If you have received this email in error please notify our IT
helpdesk on +44(0) 1254 303030
On 26/02/13 19:47, Neil Simmonds wrote:
Hi all,
I've got a strange problem that I'm trying to diagnose and would appreciate any help you can give.
We have 2 new servers that have recently been set up that are Aix servers running the hobbit client. We have 62 other Aix server with the same client running absolutely fine.
The problem is that the client data is getting cut off mid stream. It's always in the ps output. I've checked the MAX settings and there all ok, in fact we have other clients that are sending data files larger than these that are working fine. I've checked the data on the client and it's complete but if I look in /xymon/data/hostdata on the server the data seems to be almost always getting truncated to 69518 bytes. Occasionally a full message (approx 93k) gets through.
There are no messages regarding truncated data in the server logs and the only message I can find on the client is the following,
2013-02-26 08:41:21 Write error while sending message to bbd at xymonserver:1984
2013-02-26 08:41:21 Whoops ! bb failed to send message - write error
I've googled this extensively and can't find anything that seems relevant to our problem.
I get this from time to time, primarily when the xymon host has very limited bandwidth. It seems to me that Xymon will accept whatever data has been received prior to the connection being broken/interrupted, and pretend it is complete (as opposed to discarding it away).
If this is happening frequently/all the time, I would suspect firewall settings, and/or MTU issues (if it is packet size related). Check that you are not blocking all ICMP, or that path MTU discovery is working properly, check any firewall is not timing out or blocking the connection for some reason, and that there is enough bandwidth for the messages.
Potentially, a tcpdump at both client and server could be educational, possibly load these into wireshark for analysis.
PS, I wonder when we will get compression, and/or encryption for the status messages? Both would assist in making sure the complete message arrives un-altered...
Regards, Adam
-- Adam Goryachev Website Managers www.websitemanagers.com.au
On 26/02/13 8:26 PM, Adam Goryachev wrote:
On 26/02/13 19:47, Neil Simmonds wrote:
Hi all,
I’ve got a strange problem that I’m trying to diagnose and would appreciate any help you can give.
We have 2 new servers that have recently been set up that are Aix servers running the hobbit client. We have 62 other Aix server with the same client running absolutely fine.
The problem is that the client data is getting cut off mid stream. It’s always in the ps output. I’ve checked the MAX settings and there all ok, in fact we have other clients that are sending data files larger than these that are working fine. I’ve checked the data on the client and it’s complete but if I look in /xymon/data/hostdata on the server the data seems to be almost always getting truncated to 69518 bytes. Occasionally a full message (approx 93k) gets through.
There are no messages regarding truncated data in the server logs and the only message I can find on the client is the following,
2013-02-26 08:41:21 Write error while sending message to bbd at xymonserver:1984
2013-02-26 08:41:21 Whoops ! bb failed to send message - write error
I’ve googled this extensively and can’t find anything that seems relevant to our problem.
I get this from time to time, primarily when the xymon host has very limited bandwidth. It seems to me that Xymon will accept whatever data has been received prior to the connection being broken/interrupted, and pretend it is complete (as opposed to discarding it away).
The problem is that there isn't a well defined "end of message" on a standard client report. The message starts with "client HOSTNAME.OS CLASS" line then consists of a bunch of sections starting with "[section]" lines followed by lines of text. When the client has finished sending its message it just does a shutdown on the write socket and reads any returned data until EOF. That's it. The server probably doesn't care if the client even reads the data it sends back, and has no way of communicating with it anyway.
So if the client connection to the server is interrupted mid-stream, the server quite probably just handles it as a socket shutdown and accepts whatever has been received so far as the whole message.
If this is happening frequently/all the time, I would suspect firewall settings, and/or MTU issues (if it is packet size related). Check that you are not blocking all ICMP, or that path MTU discovery is working properly, check any firewall is not timing out or blocking the connection for some reason, and that there is enough bandwidth for the messages.
Potentially, a tcpdump at both client and server could be educational, possibly load these into wireshark for analysis.
PS, I wonder when we will get compression, and/or encryption for the status messages? Both would assist in making sure the complete message arrives un-altered...
Indeed. There are other ways of delivering/fetching messages - maybe worth exploring for more reliable transmission.
David.
Regards, Adam
-- Adam Goryachev Website Managers www.websitemanagers.com.au
Xymon mailing list Xymon at xymon.com http://lists.xymon.com/mailman/listinfo/xymon
-- David Baldwin - Senior Systems Administrator (Datacentres + Networks) Information and Communication Technology Services Australian Sports Commission http://ausport.gov.au Tel 02 62147830 Fax 02 62141830 PO Box 176 Belconnen ACT 2616 david.baldwin at ausport.gov.au Leverrier Street Bruce ACT 2617
Keep up to date with what's happening in Australian sport visit http://www.ausport.gov.au
This message is intended for the addressee named and may contain confidential and privileged information. If you are not the intended recipient please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited and may be unlawful. If you receive this message in error, please delete it and notify the sender.
On 26 February 2013 19:47, Neil Simmonds <Neil.Simmonds at express-gifts.co.uk>wrote:
There are no messages regarding truncated data in the server logs and the only message I can find on the client is the following,
2013-02-26 08:41:21 Write error while sending message to bbd at xymonserver :1984****
2013-02-26 08:41:21 Whoops ! bb failed to send message - write error
Perhaps try running the client script manually like so:
$ cd ~xymon/client/bin $ sudo -u xymon ./xymoncmd $ time ./xymonclient.sh
This might show an error you didn't see before. At the very least, it will give you an idea how long it takes to run/fail. You might also run it as:
$ sh -x ./xymonclient.sh
Then see what takes all the time.
Perhaps you could run it through truss to see what system calls are being run when the connection closes. Like so:
$ truss -f ./xymonclient.sh
It's likely to be caused by taking to long to transfer the data, either because the data is taking too long to transmit (eg duplex mismatch causing network errors) or because there's too much data to sent. You could try increasing the timeout value for xymond on the server by adding "--timeout N" (from 5 to 60) in tasks.cfg. The man page for xymond says the default is 10 seconds, but the code for v4.3.10 shows 30 seconds.
I don't think the server normally logs a message if it times out a connection in this way. However if you turn on debug (by adding "--debug" in tasks.cfg) then it should log "No command for update_statistics" when this happens.
J
participants (4)
-
david.baldwin@ausport.gov.au
-
jlaidman@rebel-it.com.au
-
mailinglists@websitemanagers.com.au
-
Neil.Simmonds@express-gifts.co.uk