When tunneling protocols such as IPSec(in tunnel mode) or GTP are in use, they put multiple IP flows into one flow. Since there is only one flow, how do we scale up the throughput? Assigning more processor cores will not help since packets from one flow can go to only one core. Is there anyway to get around this issue? In my case, the issue is with GTP. The eNodeBs put all the IP flows from UEs into a GTP tunnel for which the 5-tuple will all be same. Since we have only 5 to 10 eNodeBs this results in the same number of IP flows. Hence, the core utilization becomes very uneven with cores getting utilized >80% and some with <10% usage. Each flow gets processed by a single core so that there will be no packet reordering. Since thousands of IP flows get tunneled into just 5 to 10 IP flows, the random-ness of the RSS hash function somehow gets uneven resulting in a few cores getting overloaded while few are still idle.
Would it be correct to call it an inherent issue with any tunneling protocol? Any way to work around this? Also, what is the max throughput achievable with a single flow? I'm just looking for some benchmark figure on any hardware here. You can even share the results of IPSec in tunnel mode. How much throughput can you achieve with a single IPSec tunnel and what do administrators normally do to scale up the throughput?