Tuning network interface for TCP over TCP in Linux

Question

I'm porting a legacy embedded application to Linux. This application communicates with a remote server over a TCP connection using a proprietary protocol. In addition to the application specific messages, this protocol also implements messages that allow TCP and UDP traffic to be tunnelled through the TCP socket so that the application can serve third party clients (e.g. an embedded web server). This, in effect, is a bespoke VPN solution.

Because the original application is running on a bare metal system, there was no choice but to implement all the additional services as part of the application.

As part of my porting effort, I decided to use the TUN/TAP driver provided by the Linux kernel and route all the encapsulated TCP and UDP frames to a TUN interface. This however results in running TCP over TCP bringing in all the problems associated with it. The original implementation doesn't have this problem because, the tunnel doesn't encapsulate a full TCIP/IP stack.

So my question is, is it possible to configure the TUN interface so that the TCP/IP stack running through it doesn't perform any queuing and retransmissions? Or am I stuck with the bespoke implementation that I inherited? I'm only aware of the retransmission issue with TCP over TCP. Are there any other issues that I need to be aware of with such an solution?

So you're redesigning the protocol? Why not just use proper IP, then? — Michael Hampton, Jan 15 '19 at 05:22
@Michael Hampton No I'm not redesigning the protocol. I don't have access to the server which is run by a 3rd party. I just want to handle the tunnelled TCP, UDP traffic outside the main application. Otherwise I have to implement various servers that talk through the tunnel within the application (e.g. a webserver with websocket support). Not sure what you mean by proper IP though. Could you elaborate? — Dushara, Jan 15 '19 at 05:39
@MichaelHampton Sometimes people are stuck with legacy systems and the OP is at least aware of the problem and is taking first steps to replace said legacy system. But has to remain backward compatible with a poor design and make the best of it. In that way I see this as a very reasonable system. I have btw. had to deal with systems that were worse: A poorly designed TCP alternative running on top of an RPC protocol running on top of TCP, with the added caveat that the application only did local communication with a separate daemon responsible for host-to-host communication. — kasperd, Jan 15 '19 at 14:23

kasperd · Answer 1 · 2019-01-15T14:49:03.347

This is a tricky situation. I hope you'll eventually be able to replace the entire application with something that's better designed.

There isn't much you can do at the TUN/TAP level, because is a too low layer of the stack to understand about retransmissions.

There is however things you could do for the IP over TCP implementation to mostly mitigate the retransmission issue. Be aware that I haven't had the need to implement such a thing myself, so there could be problems I haven't realized yet. I can however explain how the ideas work in theory.

The problem is that once the outer TCP connection lose any single packet, the receiving side will be blocked until the lost packet has been retransmitted. This will cause the inner packets to be delayed which may be detected as packet loss by the inner layer causing retransmission on the inner level as well which will needlessly consume extra bandwidth.

On the receiving side

My best idea on how to deal with this is to tweak the receiving side in order to partially bypass the kernel TCP stack. You still set up the TCP connection using the kernel TCP implementation just as you would in the normal case. But on the receiving side you don't actually use the data you receive from the TCP socket. Instead you will have a thread or process which is constantly reading from the TCP socket and discarding all the received data.

In order to have packets to deliver to the TUN/TAP interface you use a raw socket that will receive the TCP segments as seen on the wire. This process can use filters in the kernel to only see those packets it cares about and ignore any excess packets if the kernel cannot do filtering accurately enough. Your process has to do enough of the TCP reassembly itself in order to extract the inner packets which it can then deliver to the TUN/TAP interface.

What's important here is that when an outer packet is lost only the inner packets affected by it will be lost or delayed. Your process can keep reassembling packets after the lost one in order to extract and deliver the inner packets to the TUN/TAP interface. The inner TCP stack may still retransmit a few packets, but not nearly as many as when the outer TCP connection stalls.

There is a couple of caveats to point out which may or may not be obvious:

If the receive window or congestion window fills up TCP on the sending side will stall. You cannot prevent that, but you can reduce the risk by ensuring the outer TCP connection supports selective acknowledgements (SACK).
Depending on the specifics of the tunnel protocol it may be hard or even impossible to accurately identify packet boundaries after a lost packet. If this turns out to be the case for the protocol you need to implement you may be out of luck. I'd have suggested to modify the protocol, but I understand that's not an option for you.

On the sending side

Being able to work around on the receiving side isn't sufficient for packets in the other direction where you are on the sending side. You cannot prevent the outer TCP connection from stalling on the receiver when a packet is lost.

Instead your best bet is to try and avoid unnecessary retransmissions on the inner connection. If possible you can tweak retransmit timers on the inner TCP connections. You need to wait at least 2 roundtrip times before the inner TCP connection retransmit a packet.

Completely disabling retransmits on the inner TCP connection wouldn't be a good idea, as the packets can be lost before or after the tunnel in which case the outer TCP connection won't be able to retransmit.

A theoretical possibility but likely lots of work to implement is that you use the raw socket mentioned above to snoop on ACK packets. That way you can deduce which inner packets are still in flight. Every inner TCP packet would then have to be checked against packets in flight, and if it is a retransmit of a packet which the outer TCP connection has not yet acknowledged, you silently drop the retransmit by the inner TCP connection.

Ignoring the problem

Chances are that the current application doesn't do any of this. It probably just does the TCP over TCP part and hopes for the best. And if it hasn't been a problem for you so far, it's probably not going to be a problem once you replace one end of the connection with a new implementation of the same protocol.

As such it may be more productive to just try with a known suboptimal protocol and only fix it, if you find it to cause real problems. This of course depends on what the consequences would be of deploying the reimplementation and running into problems later.

Thank you for the detailed explanation. I was hoping there would be something like a sysfs or configfs interface but alas no. I'll have to either do your scheme (which looks complicated) or drop the TUN/TAP idea and open sockets to local services each time a virtual tunnelled socket is opened and route the traffic to and fro. — Dushara, Jan 16 '19 at 20:18

Tuning network interface for TCP over TCP in Linux

1 Answers1