25

Logically, VPN should be faster than SSH for tunneling, because:

  • It's running on UDP and not TCP (so no TCP over TCP)
  • It has compression

However, today I tested Redis replication over both methods.
I ran the test over an Ireland AWS VM, connecting to a US-East AWS VM.
Since my test case is Redis replication, this is exactly what I tested - I ran a blank Redis server, and after it finished loading, I executed slaveof the other server, and measured the time between Connecting to MASTER and MASTER <-> SLAVE sync: Finished with success. In between, I used

while 1; do redis-cli -p 7777 info | grep master_sync_left_bytes;sleep 1; done

To get a crude estimation of the speed.
SSH won by a long shot: ~11MB/s compared to OpenVPN's ~2MB/s.
Does that mean that all of what I reaserched was wrong, or have I grossly misconfigured my setup?

Update

I've made several test with the same dataset, and got these results:

  • OpenVPN
    • TCP:
      compression: 15m
      no compression: 21m
    • UDP:
      compression: 5m
      no compression: 6m
  • SSH
    defaults: 1m50s
    no compression: 1m30s
    compression: 2m30s

Update2

Here are the iperf results, with bidirectional tests (except SSH, where no return path is available)

| method           | result (Mb/s)|
|------------------+--------------|
| ssh              | 91.1 / N.A   |
| vpn blowfish udp | 43 / 11      |
| vpn blowfish tcp | 13 / 12      |
| vpn AES udp      | 36 / 4       |
| vpn AES tcp      | 12 / 5       |

Technical specs

I'm running CentOS 6.3 (server), CentOS 6.5 (client).
OpenVPN version is 2.3.2 (same as in Ubuntu 14.10, so no moldy version there)
My SSH tunnelling looks like:

ssh -f XXXX@XXXX -i XXXX -L 12345:127.0.0.1:12345 -N

My configuration file looks like:
server

port 1194
proto udp
dev tun0
topology subnet
log /var/log/openvpn.log

ca XXXX
cert XXXX
key XXXX
dh XXXX
crl-verify XXXX

cipher AES-256-CBC

server XXXX 255.255.255.0

ifconfig-pool-persist /etc/openvpn/ipp.txt
keepalive 10 120
comp-lzo
status /var/log/openvpn-status.log
verb 3
tun-mtu 1500
fragment 1300

persist-key
persist-tun

client

client

remote XXXX 1194

proto udp
dev tun
log /var/log/openvpn.log
comp-lzo

cipher AES-256-CBC
ns-cert-type server

# the full paths to your server keys and certs
ca XXXX
cert XXXX
key XXXX

tun-mtu 1500 # Device MTU
fragment 1300 # Internal fragmentation

persist-key
persist-tun
nobind
Nitz
  • 1,038
  • 1
  • 8
  • 18
  • Don't forget the overhead of encryption, I'd bet SSH is a bit more optimized (as it's been around for longer, especially during times when bandwidth was much more restricted, and CPU power). – NickW Dec 17 '14 at 14:56
  • 3
    SSH supports compression as well, so that isn't necessarily something different between OpenVPN and SSH. Have you tried disabling compression on both side? When you perform the transfer over OpenVPN, run top or something on your client/server. Are there any obvious signs that you are maxing your CPU/Memory/etc with the VPN connection? – Zoredache Dec 17 '14 at 17:41
  • 2
    It seems unlikely for a AWS hosted system, but there is a small possibility that UDP is getting rate limited or something. Have you tried doing OpenVPN over TCP? – Zoredache Dec 17 '14 at 17:41
  • @Zoredache thanks for the tips. I will attempt to manually disable compression and move OpenVPN to TCP. I'm now attempting to find a explanation to how ssh tunnelling actually works - it can't be keeping the package intact (since it redirects to another port), so I'm starting to guess its stripping the header and rebuilding it on the other side, avoiding the TCP-on-TCP issue – Nitz Dec 17 '14 at 18:15
  • 1
    Following Zoredache's line of thought, net neutrality is dead and gone, and various providers artifically retard various services as they please. It may well be that someone, somewhere in the chain, is de-prioritising OpenVPN traffic. Unless they're doing it via deep packet inspection, though, you could test that by moving your OpenVPN to a different port number, and trying the test again. – MadHatter Dec 18 '14 at 10:23
  • 4
    @Nitz TCP tunnels in ssh do not use any TCP over TCP. In fact the ssh client is usually run with insufficient privileges to even do it. And no, ssh does not strip any TCP headers from packets, because it never even touches a TCP packet. ssh is just an application making use of the TCP stack in the kernel, like any other application. Data travels through one TCP connection from some program to the ssh client. The ssh client encrypts the data and send it through the TCP connection to the server. The server decrypts it ans send it through the third TCP connection to a program at the other end. – kasperd Dec 18 '14 at 10:37
  • @kasperd so SSH just opens sockets on both sides, and moves the data from the sockets from one side to another? I've been poking around the code but I can't seem to understand what happens there – Nitz Dec 18 '14 at 10:40
  • 1
    @Nitz Yes, that's how TCP tunnels work. Additionally there is of course some multiplexing, compression, and encryption going on. – kasperd Dec 18 '14 at 10:49
  • Wow your test results comparing SSH and OpenVPN give results I would not expect at all. If you are interested in digging deeper I would use a different tool to test instead of redis. So I would try running iperf tests over your ssh/openvpn tunnels, and get the bandwidth latency results. Past that, I would probably fire up tcpdump and see if there is anything happening while you are using OpenVPN. Lots of retransmits, unusual fragments, any errors, etc. – Zoredache Dec 18 '14 at 17:50
  • @Zoredache I was shocked too. My current theory is that kasperd is right, and SSH doesn't transfer headers, only content, so it beats OpenVPN's overhead of TCP over UDP. I'm still thinking about whether that's the reason. I might do your testing, although my boss is starting to think I'm obsessing about this too much :) – Nitz Dec 18 '14 at 20:29
  • 2
    Sure there might be a little more overhead with OpenVPN because it has the extra IP/TCp headers. But that shouldn't make a difference of 4-10 times slower. If the difference was in the 5-10% slower range I might be tempted to blame that. The reason you might want to still investigate is that this could be a symptom of some underlying problem that might be impacting other things in a way that is less obvious. – Zoredache Dec 18 '14 at 20:35
  • Did you try with blowfish cipher rather than AES? – gparent Dec 18 '14 at 21:03
  • @gparent no. Didn't think of it, but it'll be unacceptable as a solution - I don't know my crypto that well, but it's vulnerable in long sessions / short keys. Don't think it's the crypto's fault anyway. – Nitz Dec 18 '14 at 21:10
  • blowfish is the default cipher shipping with nearly every OpenVPN install to date. If you think it is vulnerable, you should report it to their developers, but I suspect that's not the case. – gparent Dec 18 '14 at 21:23
  • Was not aware of that. When choosing a key, I saw [this](http://en.wikipedia.org/wiki/Blowfish_%28cipher%29#Weakness_and_successors) and figured I'd better avoid it – Nitz Dec 18 '14 at 21:25
  • @gparent blowfish improved things by a bit, but SSH is still winning. Updated chart – Nitz Dec 18 '14 at 21:28
  • Finished reading the `tcpdump` output. There are occasional black patches of duplicate packets, but 90% of the output is good, so that's no explanation. – Nitz Dec 18 '14 at 22:12
  • 1
    This is even more interesting because openssh (which I presume you're using) has some well-known performance problems. Maybe the systems you're using have it patched already, I'm told some linux distros do that. See https://www.psc.edu/index.php/hpn-ssh/640 . Also, your "fragment 1300" means that every TCP packet is getting fragmented at 1300 bytes. To optimize throughput, I would want to (1) only have openvpn fragment when absolutely necessary, which is probably more like 1450 (2) see if i could get the tcp stack to set its MTU appropriately, which would be this same value. – Dan Pritts Dec 19 '14 at 17:28
  • @DanPritts I must admit I'm afraid to touch these values - the current form is "magic numbers" I picked up along the way. You're right though, I'll give it a shot. By "TCP Stack" you mean the actual server's ethernet card, no the openvpn one, right? – Nitz Dec 19 '14 at 17:46
  • 1
    (1) OpenVPN should use the largest MTU available on your network path. Probably 1500 bytes (maybe not). (2) openVPN interface needs to have MTU based on this. calculate by subtracting the openVPN overhead from original MTU. (3) kernel's TCP software ("stack") needs to know about this MTU, and set the TCP MSS based on this value. If you are lucky, this all happens automatically. But there are many MTU options in the docs so it probably doesn't, not always. To test, check your tcpdump. Are all the packets in a bulk xfer the same size? Or are they alternating long and short? – Dan Pritts Dec 19 '14 at 18:49
  • 1
    @Nitz The most extreme stack of headers I could possibly imagine for a VPN still only came out as 248 bytes. Assuming the MTU is at least 1280 bytes, that is still less than 20%. And the 5-10% suggested by Zoredache is a much more likely amount of overhead being added. The amount of bytes used for headers simply isn't the problem. The problem when TCP is used multiple times in the stack isn't space used for headers, it has to do with the timeouts, retransmissions, flow-/congestion-control, and buffering happening in TCP. TCP was not designed to exist more than once in the protocol stack. – kasperd Dec 19 '14 at 19:55
  • @kasperd if you're referring to my OpenVPN performance issues, it still happens when I use UDP, so it can't be blamed on TCP on TCP – Nitz Dec 19 '14 at 20:00
  • @Nitz A different issue that can show up, even if there is no TCP on top of TCP has to do with how packets are fragmented. PMTU discovery can easily be broken by clueless administrators anywhere along the path. Even if PMTU discovery isn't broken, the presence of a VPN means there are two different layers in the stack, which could do fragmentation. But you don't want fragmentation at all. If TCP exists anywhere in the stack, you are much better off relying on segmentation by TCP. The best workaround for that tend to be to clamp the MSS on TCP-SYN packets at one end of the VPN connection. – kasperd Dec 19 '14 at 20:06
  • 1
    @kasperd and how do I do that? Currently, I commented out the framgnet/mtu commands and put `mtu-test` instead. Got 1541 MTU, and iperf gives out 21.4 Mb/s, still much lower than SSH – Nitz Dec 19 '14 at 20:12
  • If either end of the VPN connection is a Linux machine routing the traffic, then this should work: `iptables -A FORWARD -p tcp --tcp-flags SYN SYN -j TCPMSS --set-mss 1220`. If you already have other iptables rules, you would have to adjust accordingly. If you want to use it on an endpoint rather than on a router on the path, then you need to insert rules in both the INPUT and OUTPUT chains. – kasperd Dec 19 '14 at 20:19
  • @kasperd as far as I can understand MSS, as long as I let OpenVPN discover the MTU by itself, MSS isn't needed. Am I wrong? – Nitz Dec 19 '14 at 21:13
  • @Nitz In order for MTU to work as intended, first the endpoints of the VPN connection need to detect the PMTU between themselves. Then they have to use that to set the correct MTU on the virtual link. That virtual link will be just one of many links on a path on which PMTU has to work as well. Even if PMTU works on one of those two levels, it isn't sufficient. PMTU has to work on both levels, and that will usually be outside of your control. MSS on the other hand can be adjusted by any single router forwarding the unencrypted packet, and as long as the new value is low enough, it will work. – kasperd Dec 19 '14 at 21:22
  • @kasperd I added `fixmss`, and made sure using `ip route get to ` that the mss (`1460`) is lower than the interet-path MTU (`1500`). No improvement in transfer rate. I guess that's not it – Nitz Dec 19 '14 at 21:37
  • @Nitz 1460 is surely too high. You need to subtract two IP headers, one TCP header, and the VPN overhead from the path MTU. Assuming the PMTU actually is 1500, then that still means your MSS need to be lower than 1440. 1400 might be low enough, but I wouldn't be certain. When tweaking MSS, I recommend first trying with 1220. I don't know the `fixmss` setting, does it modify MSS on packets in transit, or does it only affect packets originated by that host? – kasperd Dec 20 '14 at 01:11
  • @kasperd when inspecting the encrypted OpenVPN traffic, I can see that the actual OpenVPN packets are very small (~160). This stays true after every configuration tweak I make. I very few retransmits or failures, and the packet going into the `tun` device seem decently sized (almost all are 1424). I'm thinking about leaving it that way and staying with SSH. Do you have any experience with cross-continent traffic showing my performance is inadequate? – Nitz Dec 20 '14 at 12:21
  • 2
    @Nitz If I understand you correctly, you are saying that the unencrypted packets entering the virtual interface are 1424 bytes, but the encrypted packets send on the physical interface are only 160 bytes. That would indicate a pretty extreme fragmentation happening at the VPN layer or the UDP/IP layer beneath it. That could certainly explain the performance problem. The packets on the virtual interface should be something on the order of 1300-1400 bytes. The packets on the physical interface should be something on the order of 1400-1500 bytes. – kasperd Dec 20 '14 at 23:13
  • In practice, if I were you I'd just stick with SSH, but I agree with kasperd. 160 byte packets on the wire is a recipe for poor performance. – Dan Pritts Dec 22 '14 at 20:32

3 Answers3

10

Thanks to kasperd's comment, I learnt that SSH doesn't suffer from TCP-over-TCP since it only moves packet data. I wrote a blog post about it, but the most interesting thing is the netstat output, proving that SSH indeed doesn't preserve Layer 3,4 data:

after tunneling, before connecting

backslasher@client$ netstat -nap | grep -P '(ssh|redis)'
...
tcp        0      0 127.0.0.1:20000             0.0.0.0:*                   LISTEN      20879/ssh
tcp        0      0 10.105.16.225:53142         <SERVER IP>:22              ESTABLISHED 20879/ssh
...

backslasher@server$ netstat -nap | grep -P '(ssh|redis)'
...
tcp        0      0 0.0.0.0:6379                0.0.0.0:*                   LISTEN      54328/redis-server
tcp        0      0 <SERVER IP>:22              <CLIENT IP>:53142           ESTABLISHED 53692/sshd
...

after tunneling and connecting

backslasher@client$ netstat -nap | grep -P '(ssh|redis)'
...
tcp        0      0 127.0.0.1:20000             0.0.0.0:*                   LISTEN      20879/ssh
tcp        0      0 127.0.0.1:20000             127.0.0.1:53142             ESTABLISHED 20879/ssh
tcp        0      0 127.0.0.1:53142             127.0.0.1:20000             ESTABLISHED 21692/redis-cli
...

backslasher@server$ netstat -nap | grep -P '(ssh|redis)'
...
tcp        0      0 0.0.0.0:6379                0.0.0.0:*                   LISTEN      54328/redis-server
tcp        0      0 127.0.0.1:6379              127.0.0.1:42680             ESTABLISHED 54328/redis-server
tcp        0      0 127.0.0.1:42680             127.0.0.1:6379              ESTABLISHED 54333/sshd
tcp        0      0 <SERVER IP>:22              <CLIENT IP>:53142           ESTABLISHED 52889/sshd
...

So I'm going to use SSH tunneling, since it seems that my OpenVPN isn't misconfigured or anything, just not the right tool for the job.

Nitz
  • 1,038
  • 1
  • 8
  • 18
5

It depends what you are trying to achieve and what your priorities are. VPN connects you to a network and SSH to a machine. VPN is a bit more secure with the encapsulation, which SSH does not do.

Also, VPN allows all the traffic to easily go through it, versus SSH where you will have to force the applications.

Are you going to use AD at all? Because VPN will let you do that with much more ease.

I prefer SSH for speedy necessities and VPN for critical applications where I should spare the extra time.

Depending on the situation, I have used SSH in a VPN in case the VPN was compromised. This way someone probing would have to get through the SSH tunneling.

rhymsy
  • 203
  • 3
  • 7
  • 2
    I'm running redis over the tunnel, so a single port suffices to me. I was just amazed by the fact that VPN is not always the best solution for tunneling network traffic – Nitz Sep 17 '15 at 06:10
-2

SSH port forwarding is not tunneling, because protocol stack wrapping doesn't happen. OpenVPN does wrap, so SSH port forwarding won't suffer from TCP-over-TCP issue.

For example in OpenVPN, protocols stack will be similar to:

Redis
TCP
IP
OpenVPN (tun mode)
UDP
IP
Ethernet

IP (a network layer protocol) appeared here twice, that's we calling it a tunnel.

sysadmin1138
  • 133,124
  • 18
  • 176
  • 300
Low power
  • 99
  • 3