Hi, I'm the guy who documented the original issue Robert forwarded.
On Friday, 11 June 2010 Buchan Milne wrote:
This is quite an old version. Time to consider an upgrade?
Red Hat Enterprise is very conservative about switching package versions during the lifetime of a single RHEL major release, though they do frequently issue backported patches. We tend to avoid replacing the shipped packages unless we can't help it.
debug1: match: OpenSSH_3.9p1 pat OpenSSH_3.* debug1: Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_4.3 debug1: SSH2_MSG_KEXINIT sent Read from socket failed: Connection reset by peer
Apparently something goes wrong in the server just at the start of key exchange. The xymon ssh test reports the remote protocol and software versions, so it must converse at least that far, but I guess it doesn't go on through the key exchange.
In quite a few years in production environments with hundreds of linux servers, I haven't seen that myself ...
Have you managed to find a way to reproduce it? Have you filed a bug? IOW, maybe prevention of the problem is better than identification.
It won't be that easy to prevent until the kinks are worked out of RHEL's NFSv4 state recovery code. I.e., it's not some bug in sshd we are talking about. At the times in question, the box is partially moribund: some kernel services are deadlocked, but the sshd is still able to run just enough to accept connections and get as far as key exchange but no further. The details don't matter that much; the only point of reporting the issue on this list was to point out that xymon's current ssh test doesn't confirm the complete handshake, and maybe it would be a stronger test if it did.
...
Most of the suggestions I've seen so far on the list seem to involve running some arbitrary command on the host. That was my first idea too, but it needs some cautions.
that requires giving xymon an identity with which to log in to the host. To avoid saving a password, that should be done with a public/ private key pair, the public key being installed in authorized_keys on each machine to be tested.
the identity should not be allowed to run arbitrary commands. an entry in authorized_keys can be limited to running a single fixed command.
the discussions of whether certain commands or files are present everywhere still seem Unix-centric; you can get ssh daemons for other OSes too, which may or may not even have something called /tmp etc.
because a fixed command should be assigned for the identity key anyway, OS heterogeneity isn't such a big problem; the fixed command could just be "echo OK" or whatever the equivalent is on another OS.
However, because that approach does require extra setup on each client machine (create a uid or pick some existing one like nobody, set up authorized_keys to accept the xymon identity and run the fixed command), an approach based on ssh-keyscan might be a reasonable compromise. It doesn't require extra setup on the host or permission to log in, and it does confirm at least that enough of the OS is working for the ssh daemon to retrieve its keys.
Chapman Flack