0

We have a dedicated SSH tunnel server, which supports a few dozen remotely located hosts. The hosts each create a reverse tunnel into the server with assigned port numbers, using autossh to keep the connections persistent. This gives us access to the remote hosts via the server. This has all worked great until recently...

Comcast required us to move from one connection to another. The old and new modems are the same model, but on different cable drops, and of course the new connection has a new IP address. We took the opportunity to replace the server hardware as well, but the new server box is running the same OS (Ubuntu 10.04 LTS) and OpenSSH (5.3p1) as the old. A new host key was generated and distributed to the remote hosts.

Since that change all of the tunnel connections have become flaky, and typically will stay up for only 10 or 15 minutes at the most. Autossh detects and reconnects, but this is making interactive sessions quite frustrating to use. I can't figure out where the problem lies.

Looking at the log on the server, I see: "Received disconnect from x.x.x.x: 11: disconnected by user" and then the tunnel being reestablished. Even at log level DEBUG3 I don't see anything happening before the disconnect on the server end, just the expected keepalive messages.

The connections are dying regularly, whether they are in use or not, and they will die while being used and data is flying (like in the middle of a large sftp). The connections don't all die at the same time - it seems pretty randomly distributed.

On the server side we have ClientAliveInterval = 30, ClientAliveCountMax = 6, and TCPKeepAlive = yes.

The remote sites are running OpenSSH 5.6p1.

I'm at wits end... Any ideas on where I should be looking?

Mike Blackwell
  • 1,005
  • 1
  • 8
  • 12
  • 2
    Start by testing basic connectivity maybe? Leave a screen session running on a remote host with ping set to continuously ping your SSH server. Once the connection is reestablish check if your dropped packets. – Zoredache Aug 17 '11 at 20:53
  • 1
    Do these tunnels die regardless of activity or otherwise? – Tom Newton Aug 17 '11 at 22:55
  • I agree with ZeroDache, an SSL tunnel will disconnect if there is a blip in connectivity along the route. – tkrabec Aug 17 '11 at 21:03
  • That is Zoredache please, and this should be a comment, not an answer. Though I understand that you don't have the privilege to add comments yet. – Zoredache Aug 17 '11 at 22:26
  • Sorry about the name, when do we get the ability to comment, Zerodache? – tkrabec Aug 17 '11 at 22:49
  • 1
    50 rep; see [here](http://serverfault.com/privileges) for reference. And, not to nitpick, but you spelled @Zoredache's name wrong again ;). – Shane Madden Aug 17 '11 at 22:57
  • @TomNewton - Yes, the tunnels die regardless of activity. Idle or in the middle of a heavy sftp, it doesn't seem to matter. – Mike Blackwell Aug 18 '11 at 13:13

1 Answers1

1

A useful tool here (for debugging network connectivity) is mtr, which is a combination of traceroute and ping. Say you were on your workstation, you would do "mtr {remote-server-ip}". The output is matrix like (rows and columns) and will display the latency and packet loss at each hop between your machine and the remote server. I used this the other week to prove to the ISP that they were dropping ~40% of packets at our T1 (which was causing inability to establish VPN connections).

Kendall
  • 1,063
  • 12
  • 25