1

When tunneling protocols such as IPSec(in tunnel mode) or GTP are in use, they put multiple IP flows into one flow. Since there is only one flow, how do we scale up the throughput? Assigning more processor cores will not help since packets from one flow can go to only one core. Is there anyway to get around this issue? In my case, the issue is with GTP. The eNodeBs put all the IP flows from UEs into a GTP tunnel for which the 5-tuple will all be same. Since we have only 5 to 10 eNodeBs this results in the same number of IP flows. Hence, the core utilization becomes very uneven with cores getting utilized >80% and some with <10% usage. Each flow gets processed by a single core so that there will be no packet reordering. Since thousands of IP flows get tunneled into just 5 to 10 IP flows, the random-ness of the RSS hash function somehow gets uneven resulting in a few cores getting overloaded while few are still idle.

Would it be correct to call it an inherent issue with any tunneling protocol? Any way to work around this? Also, what is the max throughput achievable with a single flow? I'm just looking for some benchmark figure on any hardware here. You can even share the results of IPSec in tunnel mode. How much throughput can you achieve with a single IPSec tunnel and what do administrators normally do to scale up the throughput?

2 Answers2

0

Networking can be a bottleneck with or without a tunneling protocol. On some point, the backplane transfer rate cannot keep up with the network transfer rate. Normally all traffic is routed over a single network interface, which aggregates multiple connections.

It should be possible for the encryption/decryption of individual flows to be done in parallel. Other than the additional backplane traffic, using an encrypted tunnel should not limit the network thoughput significantly with multiple network flows.

Hardware encryption accelerators can help if the encryption overhead is an issue. However, it appears you do have sufficient CPU that should not be an issue.

If your application is saturating the network link, it may be possible to bond multiple interfaces to increase throughput. The switch your server is connected to will need to support channel bonding.

Using a faster network card can help, but eventually you will reach a speed that your backplane can not keep up with.

Network devices also have capacity limits, and your requirements exceed that capacity. In this case your network infrastructure becomes the bottleneck.

Due to memory caching, it may not make sense to evenly balance the load across multiple CPUs. An application will often run faster if it continues to run on a single CPU, than if it is frequently scheduled on different CPUs.

BillThor
  • 27,737
  • 3
  • 37
  • 69
0

The issue you describe is not an inherent issue of tunnelling protocols. Rather it is more related to the presence of encryption than to the tunnelling.

There is precedence for ECMP implementations inspecting fields at higher protocol layers than it is operating. For example ECMP operating at the IP layer is often inspecting UDP and TCP port numbers. It would be no different for an ECMP implementation to inspect the IP addresses in the inner IP header of at tunnelled packet.

However due to encryption this information is not readily available without decrypting the packet. And being able to tell flows apart without knowing the encryption key would usually be considered a security flaw in the encryption algorithm. This is an important point to keep in mind as you may have to make a compromise between performance and security.

Possible solutions I can think of include:

  • Copy the flow label from the inner IP header to the outer IP header at encryption time. This will obviously leak information about the contents of the flow label.

  • Configure multiple tunnels and perform ECMP across the tunnels. The ECMP implementation on the sending end of the connection will try to spread traffic evenly across tunnels. However depending on the traffic patterns it may not be possible to distribute the traffic evenly across the tunnels. This uneven distribution across the tunnels is problematic not just because it can cause sub-optimal utilisation of the underlying network but also because it leaks some information about the characteristics of the unencrypted traffic. However the leak in this case will be a lot less significant than exposing parts of the inner IP header.

  • Allow decryption of arbitrary packets in parallel but put them back in the original order after decryption. A very simple implementation has one thread responsible for dispatching packets in a round-robin fashion to a number of decryption threads. After decryption another thread will pick up the decrypted packets in a round-robin fashion from the decryption threads putting them back into their original order.

    This approach is only possible if you consider all your threads to be a single failure domain and the communication between the threads is not subject to packet loss. Stated differently using this approach to distribute packets in round-robin between different network devices would not work.

  • Decrypt to an intermediate unencrypted format which contains the unencrypted payload and the IPSec sequence number. Decryption can be done in parallel after which the packets are passed to a component which buffers a limited number of packets and attempts to bring packets back in the original order on a best-effort basis.

  • Redesign the higher layer protocols to be more tolerant to reordering.

kasperd
  • 30,455
  • 17
  • 76
  • 124
  • Our implementation is based on Intel DPDK which imposes the restriction that packets from a particular flow can be handled by one particular core only. But if you look at some network processors, their processing model allows packets from a single flow to be handled by multiple cores in parallel while retaining the ingress order for egress. I guess my next question should be "Is there any way to handle this with Intel DPDK?" – Satheesh Paul Antonysamy Dec 04 '16 at 13:04