0

I have an issue with HW-offloaded balancing of network packets between CPU cores on DPDK.
In my case I have only several endpoints whose number is less than the number of CPU cores I have.
All the incoming traffic is local (not transit/forward), so I can't balance it based on IP/UDP header fields, i.e. 5-tuple or 2-tuple RSS doesn't fit my needs.

There is great news, my node is engaged in tunneling over UDP and receives packets, where UDP payload contains some tunnel identifier (4 bytes). Round-robin RSS is not suitable, because buffers associated with tunnels are per-core. So I think that packets can be dynamicaly balanced by mentioned "tunnel ID", i.e. based on only one 4-byte field inside UDP payload so that packets with the same ID will arrive to the same CPU core. Hashing of course is not necessary for this (perhaps my logical mistake).

So my question - is there any way to configure such a dynamic balancing (based on hash of one 4-byte field by offset inside UDP payload or hash of this field + some other fields, e.g. source IP) between CPU cores on some modern NIC (Intel x710 or Mellanox ConnectX-4 the most interesting)? If there is, please give an advice on how to configure this.

UPD

I concluded that hashing is not necessary, but the conclusion is probably erroneous. I don't know the tunnel IDs in advance, they come in the packet headers.

budoattack
  • 379
  • 3
  • 11

1 Answers1

0

I am new to DPDK and I haven't explored tunneling so not sure whether my thing is gonna useful for you.

From my experience, what we did was run a polling thread that actually receives all packets and then routes them (UDP packets) to different processing threads (following certain algorithm) running on a different core. the basic concept is from the "client-server multiprocess model" which is an official example for DPDK

Our work is in VOIP field so we wanted to make sure that whenever a call starts, the following all UDP packets of that call should be processed in same core for better cache locality. And the core distribution should happen only when a core reaches it's processing limit. so for the signalling packet we assigned an id for that call which later I used to redirect all packets to that specific core thread.

Nafiul Alam Fuji
  • 407
  • 7
  • 17
  • This is called "programmatic RSS", it changes the processing model from run-to-completion to pipeline and has some overhead. hardware-offloaded balancing is obviuosly faster... I want to enable hardware-offloaded RSS – budoattack Apr 03 '23 at 09:33