0

I have an Ubuntu 12.04 laptop that is taking so long to connect to various servers (in different data centres) that it seems like a bit of a lottery whether I'll actually get a connection. If I connect to the servers between themselves it's instantaneous, and I've set

UseDNS no
AddressFamily inet

On the servers I'm connecting to (and rebooted for good measure). I also put in the reverse DNS+IP of the cable connection I'm connecting from. If I connect from the laptop via telnet:

telnet my.server 22

Then the connection is also instantaneous, so it doesn't appear to be a problem with an intervening firewall. I have the same behaviour whether I connect with the IP, a short name in my hosts or the FQDN. I'm connecting with a 50mbps (cable, sync) connection so that doesn't appear to be the problem, and when I do finally get a connection then it's a good, quick, stable one. I have tried listening on another port (8000) and that makes no difference. Web and other connections from the laptop to the machine are also very good.

If I increase logging then I get the following before it hangs:

$ ssh -vvv flip
OpenSSH_5.9p1 Debian-5ubuntu1.1, OpenSSL 1.0.1 14 Mar 2012
debug1: Reading configuration data /home/anton/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug2: ssh_connect: needpriv 0
debug1: Connecting to flip [xxx.xxx.xxx.xxx] port 22.
debug1: Connection established.
debug3: Incorrect RSA1 identifier
debug3: Could not load "/home/anton/.ssh/id_rsa" as a RSA1 public key
debug1: identity file /home/anton/.ssh/id_rsa type 1
debug1: Checking blacklist file /usr/share/ssh/blacklist.RSA-2048
debug1: Checking blacklist file /etc/ssh/blacklist.RSA-2048
debug1: identity file /home/anton/.ssh/id_rsa-cert type -1
debug1: identity file /home/anton/.ssh/id_dsa type -1
debug1: identity file /home/anton/.ssh/id_dsa-cert type -1
debug1: identity file /home/anton/.ssh/id_ecdsa type -1
debug1: identity file /home/anton/.ssh/id_ecdsa-cert type -1

It's hanging on the server between the following two lines:

Nov  6 13:51:57 srv sshd[18472]: Connection from XXX.XXX.XXX.XXX port 51099
Nov  6 13:53:03 srv sshd[18472]: debug1: Client protocol version 2.0; client software version OpenSSH_5.9p1 Debian-5ubuntu1.1

It's a least quicker than yesterday though!

Does anyone have any ideas here?

AntonOfTheWoods
  • 169
  • 2
  • 8
  • 4
    Check the debug logs on the _server_. – Michael Hampton Nov 05 '13 at 22:12
  • Run `strace -t -o /tmp/ssh.trace ssh my.server` and look for the syscalls that appear to have large time gaps. My guess is that it is DNS related. – Fred the Magic Wonder Dog Nov 08 '13 at 16:39
  • Unfortunately, I can no longer test as the behaviour has completely disappeared now that I am back home... I will try and get my friend to test (from his house where it happened). Just on the off chance my tin-hat might not be on properly - I was making a connection from Ireland to France, is it outside the realms of possibility that an intervening node was trying to snoop? – AntonOfTheWoods Nov 12 '13 at 19:57
  • just in case, check that the server is not under sockstress type attack or any kind of incoming ddos, also make sure the server is not compromised by a ssh rootkit – neofutur Nov 24 '14 at 08:20
  • Just to rule out DNS you can try connecting directly to the IP Have you checked the GSSAPIAuthentication in your ssh_config? try setting it to: GSSAPIAuthentication no In your **/etc/ssh/ssh_config** – ModuleC Nov 11 '13 at 23:23

1 Answers1

1

Those symptoms are what you could expect to see in case of broken PMTU discovery. The client can connect and version information can be exchanged without problems because all the packets are small.

But once the key exchange starts larger packets are sent. If larger packets are silently dropped by some intermediate router without sending the ICMP error message required by the standard, the sender will never know the data has to be sent in smaller segments. Hence the connection stalls on the first large packet.

If this is indeed the problem, then lowering the MSS or the MTU can work around the problem. The first step could be to modify the used routing table entry in each end of the connection to include advmss 1220. Or if you don't want to modify the default route, you could simply add a more specific route with the same gateway.

You mention that the problem disappeared by itself which is also not unlikely for an MTU problem since it can disappear when BGP decides to send your packets through another path that doesn't cross the problematic router, or it can happen due to the administrator responsible for the problematic router noticing and fixing the problem.

kasperd
  • 30,455
  • 17
  • 76
  • 124