1

We recently had to rebuild one of our cloud servers (we use Rackspace). All servers are almost identical, and a snapshot of another server was used. Once live again, I allowed a cron job to run which syncs a couple of files outside source control from the original server to the newly rebuilt server, using Unison. Essentially this SSH's and compares files between the two then copies across/deletes/whatever files between the two machines.

However, since the rebuild, I'm getting emails from Cron Daemon giving me the following error:

ssh_exchange_identification: read: Connection reset by peer Fatal error: Lost connection with the server

The weird thing here is that if I login as the same user that the cron job runs under and SSH to the same server (using the same keys for auth) I don't see any errors. Also, if I run Unison manually from the command line, I see no errors. What's more, if I turn off Unison's silent mode then the output from a successful Unison batch job is shown in the console, and this same output is shown in an email, but I still get several others with the errors as above whenever the cron job runs.

I have checked permissions and content of the id_rsa/id_rsa.pub keys, authorized_keys, etc, and they seem fine.

Can anyone suggest why this might have suddenly started happening? It appears the sync is working but I'm getting several emails each time it runs with that error.

Leonard Challis
  • 53
  • 4
  • 13
  • 27
  • It doesn't looks like a ssh-key problem, actually the error looks like a connection wise problem. The error happens while exchanging ssh keys, but then says: Connection reset by peer, this is a ICMP message, I would be rather suspicious about a connection problem. I would examine first the network side of things. – Matías Dec 01 '14 at 11:10
  • Is it not a problem with the `known_hosts`? The rebuilt server probably has a different host kwy now, and if that's being checked then connection will fail to avoid man-in-the-middle attacks. – wurtel Dec 01 '14 at 12:27

1 Answers1

2

"Connection reset by peer" means that the remote end--the SSH server, in this case--closed the TCP stream abnormally. One way that can happen is if the remote process crashes.

"ssh_exchange_identification" means that the server and client were exchanging banner strings which identify the SSH software on each end of the connection. The client was waiting for the server to send its banner string when the TCP connection closed.

You really need to troubleshoot this on the server if you can. Try to find a way to reproduce the problem. Then, on the server, and assuming the server is running openssh, you could run:

/path/to/sshd -d -p 1022

This starts a debug copy of sshd listening on port 1022. It'll accept a single connection from a client and it'll print debugging output. If you can reproduce the problem while connecting to this sshd instance, the debug messages should make it pretty clear what is happening.

Kenster
  • 2,152
  • 16
  • 16