0

I am using github my project which uses eBPF to filter/lookup/redirect/drop packets based on SRv6 routing. The eBPF code is running on mellanox Connect5X for SRv6 functionality.

My expectation is mellanox Connect5X will look into SRv6 Destination outer IPv6 address header and do RSS packet spreading on RX queues. This will allow me to run XDP on multiple cores for processing.

My current result is use only one cpu core when SRv6 packets is multi-flow packets(same load efficiency as single-flow).

Question is How can I load balance the CPU even for SRv6 packets?

An example of the answer I are expecting is To know how to enable RSS for IPv6 src,dst addr only, etc.

thanks.

takeru ta
  • 1
  • 2
  • can you help me understand the question you are asking, because `Packet is received at NIC, if enabled with RSS for IPV6 will do rss and put it to appropriate RX queue. If you have multiple queues packet steered to multiple core, the XDP instance running on that queue will be handling the packet`. Why is that you are stating `only 1 CPU core is enabled for multiple RX queues for NIC?` – Vipin Varghese Feb 02 '21 at 15:07
  • @VipinVarghese hi vipin. Thanks for the comment and rephrasing! well, your understanding is correct!So a normal IPv6 packet will load multiple CPUs. This is the case for SRv6 packets, which are processed by only one CPU. you know? – takeru ta Feb 02 '21 at 20:45
  • as far I can recollect mellanox has RSS for IPv6, please check `https://docs.mellanox.com/display/WINOFv55053000/RSS+Monitoring`. In the reported case it could be `1. RSS for IPv6 is set for TCP/UDP but not set for ipv6 only` or `the SRv6 packet you are sending is always having the same header`. Please note I am not aware of `mxl5 supporting RSS on the inner dst IP if SRv6 value`. To achieve the same one has to use a NIC which support RSS on `ofsset values or RAW bytes`. Can you please confirm that `NIC is enabled with ipv6 only and you are sending different src/dst ip`? – Vipin Varghese Feb 03 '21 at 02:43
  • I will answer them in order. `RSS for IPv6 is set for TCP/UDP but not set for ipv6 only` => I don't know how to set this. `the SRv6 packet you are sending is always having the same header` => Sure, the inner address remains the same, but I don't think it's relevant. The reason is that the structure of SRv6 packets looks like this [eth][ipv6][SRH(mean ipv6 ext)][Pyload].`NIC is enabled with ipv6 only and you are sending different src/dst ip`=>yep. actually did the measurements. ipv6only: RSS working, ipv6/udp:RSS working, ipv6/tcp:RSS working, ipv6/ipv6:RSS not working, – takeru ta Feb 03 '21 at 10:47
  • Looking for the linux code, it seems to work with IPIPv6. But it's not working... I may have the key.(The structure of SRv6 is similar to IPIPv6, so I thought it could be applied). [net/mlx5e: Support RSS for IP-in-IP and IPv6 tunneled packets](https://github.com/torvalds/linux/commit/a795d8db2a6d3c6f80e7002dd6357e6736dad1b6) – takeru ta Feb 03 '21 at 10:58
  • if the SRv6 packet is having same `src-ip & dst-ip` the RSS value generated for any hash-reta will be same. Hence as per my understanding, the packet will be falling to same RX-queue always. For segment routing packet format is `ipv6 header->next_header = 43, extension header (different dst address)`and not `ipv6_ipv6`. I can be on skype to help too. – Vipin Varghese Feb 03 '21 at 14:00
  • Does the same src-ip & dst-ip mean that the ipv6 address of the outer header is the same? I understand that. I'm sorry for the misunderstanding. I was talking about the possibility that it might be possible because of the similar structure. Thank you. I would ask you to help me, but I don't speak English.... – takeru ta Feb 03 '21 at 19:05
  • I understood the claim that RSS can be enabled in ipv6 src-addr and dst-addr to enable it for SRv6. Is this correct? If so, could you please tell me how to enable it with this? When I measured it, I set the following settings `ethtool -L ens4f1 combined $(nproc --all);ethtool -K ens4f1 rxhash on;ethtool -K ens4f1 ntuple on; for proto in tcp4 udp4 tcp6 udp6; do /sbin/ethtool -N ens4f1 rx-flow-hash $proto sd;done;` – takeru ta Feb 03 '21 at 19:24
  • thank you for understanding IP-RSS works on SRC-IP and DST-IP and not on the Extension Header for SRv6. Can you please let me know for packets under test, are you sending with same SRC-IP and DST-IP or different? I am can be available on skype for debug. – Vipin Varghese Feb 04 '21 at 01:20
  • `Can you please let me know for packets under test, are you sending with same SRC-IP and DST-IP or different?` => Different Src-IPs are used. However, Dst-IP is always the same.Thank you very much. I just think it's a last resort (I live in JST, so I know it's hard to fit in time) – takeru ta Feb 04 '21 at 18:28
  • I have marked this question `needs more clarity` because 1. XDP Srv6 code is not shared, 2. pcap file is not shared, 3. there is no screenshot to show case XDP is running only one core, 4. in the live debug no information is shared. – Vipin Varghese Feb 05 '21 at 11:07
  • I see that you have edited and modified from `SRV6` to `outer IP dst/src` RSS. I clearly understand why you have done this too. – Vipin Varghese Feb 06 '21 at 02:21

1 Answers1

0

There is no issue with Mellanox NIC supporting basic RSS ie: outer SRC-IP + Dst-IP + protocol, but the expectation of Mellanox NIC to do RSS for SRV6 header content is incorrect. For the current library (verbs) and firmware as of today RSS and RPS can be validated by

  1. ethtool -S [interface-name] | grep packets | grep rx - for HW RSS spread on multiple RX queues
  2. grep mxl5 /proc/interupts - for queue to CPU mapping.
  3. ethtool --show-rxfh-indir [interface-name] - for identify the flow hash setting

based on the comments, there is a gap in understanding of packet format for SRv6 too. Packet format is ETh + IPv6 + next-header is 43 + srv6 header (next header can be ip/tcp/udp).

hence RSS is done on outer src-IP + dst-ip + protocol (43), the packets with different hash is spread to different queues

enter image description here

now using XDP loaded to the interface, one can filter for SRv6 headers and apply simple xor hash or murmur hash then redirect AF_XDP sockets or interface.

hence the whole expectation and assumption is incorrect

[EDIT-1] based on the live debug we have spent 1.5 hours explaining and educating the same.

[EDIT-2] address the comments raised 1. It refers to what the rx-flow-counter has already accumulated, not the increase in SRv6 packets In the live debug @takeru uses TREX packet generator to send packets to NIC, with packet format as ETH + SRC-IP-1 ... SRC-IP-n + DST-IP + Srv6. With a direct interface to interface connection, no other packets other than SRv6 packets will be recieved

2. In fact, if you check the load on the CPU in the case of SRv6 packets, you will see that only one CPU core is being loaded In the live debug, @takeru did not run top/htop, this is new information. @takeru was trying to understand if RSS on Outer IP is happening or not only. I have requested for a screenshot of CPU usage and tcpdump.

3. If it is only IPv6, the CPU load will be applied to other cores The request has been placed to run simple XDP-eBPF program which redirects/drops ipv6-Srv6 packet. @takeru did not run the same yet

4. Only IPv6 and ip / udp cases have increased the value count by the debugging method you mentioned The same thing happens with SRv6 in linux kernel I have pointed out to @takeru, the TREX packet he is generating of format ETH + Ipv6 + next-hdr routing + Srv6 header + next-header UDP. Hence the kernel statics will update as ipv6/UDP as it is not TCP or not SCTP or unknown protocol.

Note: takeru's reference github project

Vipin Varghese
  • 4,540
  • 2
  • 9
  • 25
  • No, it's wrong.I don't think you understand the behavior of xdp. I'll say it again. 1. `It refers to what the rx-flow-counter has already accumulated, not the increase in SRv6 packets` 2. `In fact, if you check the load on the CPU in the case of SRv6 packets, you will see that only one CPU core is being loaded` 3. `If it is only IPv6, the CPU load will be applied to other cores` 4. `Only IPv6 and ip / udp cases have increased the value count by the debugging method you mentioned` The same thing happens with SRv6 in linux kernel, so you can check it yourself if you want – takeru ta Feb 05 '21 at 10:12
  • @takeruta I have already shared and explained many times, if the HW NIC does RSS the packets will be sent to multiple queues based on hash values. If you have loaded XDP per port-queue, the corresponding EBPF will be running. depending upon the logic of eBPF you can drop/sent to kernel/sent to interface/laod balance to multiple AF_XDP sockets also. it depends upon the packets content for HW NIC to do RSS. – Vipin Varghese Feb 05 '21 at 12:44
  • That comment of yours is correct.I have the same understanding.So you understand that you need to use RSS to distribute the load to the CPU, right? My point is that the `rx-flow-counter in your answer is counted by other RSS driven packets, not SRv6 driven by RSS.` top/htop was running.`It's strange that it existed to share the screen and you didn't check it.` I don't want to talk about not running in the first place, I want you to tell me how to running.If there's no way to do it, stop writing that answer.My question is to tell me how to enable RSS for SRv6 packets. – takeru ta Feb 06 '21 at 00:09
  • @takeruta I have explained in comments, answer, and in live debug multiple times, the current expectation of `MXL5 to do SRv6 RSS` is `incorrect` because it supports more generic RSS like `outer-ip/outer-ip-tcp/udp/sctp, inner-ip, inner-ip-tcp/udp/sctp`. There are programmable NIC which allows RSS offset fields in packets https://doc.dpdk.org/guides/prog_guide/rte_flow.html. But you are not using DPDK to program but using Kernel to run the same, – Vipin Varghese Feb 06 '21 at 01:59