UDPSender in Pod1 uses DPDK rte_eth_tx_burst APIs, on the other Pod recv_from() socket system call cannot receive

Question

Rewriting the question on 14/Jun2021 for clarity

Setup details:

Linux machine running Centos 7.6 acting as a node in Kubernetes Cluster. Docker is the Container-type. More details of the machine:

 
[root@node19823 ~]# uname -a
Linux node19823 3.10.0-693.2.2.rt56.623.el7.x86_64 #1 SMP PREEMPT RT Thu Sep 14 16:53:49 CEST 2017 x86_64 x86_64 x86_64 GNU/Linux

[root@node19823 ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.5", GitCommit:"e338cf2c6d297aa603b50ad3a301f761b4173aa6", GitTreeState:"clean", BuildDate:"2020-12-09T11:18:51Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"[root@node19823 ~]# cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)
}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.5", GitCommit:"e338cf2c6d297aa603b50ad3a301f761b4173aa6", GitTreeState:"clean", BuildDate:"2020-12-09T11:10:32Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
[root@node19823 ~]#

[root@node19823 ~]# cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)


Details of my Application

10 VFs are created on a mlx_core5 100G PF.
DPDK version is 19-11
Two pods Pod1 and Pod2 run on this machine.
Pod1 and uses VF1. Pod2 uses another VF2 of the same 100G PF.  VFs are assigned to Pods using Kubernetes SRIO-DevicePlugin and SRIOV-CNI plugin.
Pod1 and Pod2 are supposed to exchange full-duplex UDP traffic.
Pod1 uses DPDK-PMD-over-VF1 for both send & receive UDP packets. VF1 is setup with 1 rxQ and 1 txQ for this purpose.
Pod2 uses DPDK-PMD-over-VF2 for send-alone. VF2 is set up with 1 rxQ and 1 txQ. For receiving UDP traffic, Pod2 uses a simple UDP-socket boudn to same IP address as of VF2.

Below are the traffic combinations tried

pod2-dpdk-pmd-tx-over-vf2 --> pod1-Dpdk-pmd-rx-over-vf1 ==> SUCCESS.
pod1-dpdk-pmd-tx-over-vf1 --> pod2-udp-socket-rx-bound-to-vf2 ==>
FAILURE.
pod1-udp-socket-tx-bound-to-vf1 --> pod2-udp-socket-rx-bound-to-vf2 ==> SUCCESS.

Looking for help in understanding the reason for FAILURE. I have verified that

A. Ethernet/IP/UDP headers filled by pod1 dpdk-sender (pod1-dpdk-pmd-tx-over-vf1) is correct . I have forwarded the packets constructed by this app to wireshark and wireshark did not show any errors.
B. Even tcpdump inside Pod2 does not show the packets sent by pod1. When dstMac address is correct, I expected the packet to at least show up on the dst-machine (pod) - it is ok if it failed in the higher layers of pod's tcpip stack. But why does the packet not appear in wireshark?
Am I missing any settings (PMD APIs or ethtool commands?) in offloading all rx packets to linux tcpip stack in vf2 (as said above, i want to send using DPDK-PMD but receive over udp-socket).

Does dpdk-sender work seamlessly with nondpdk-receiver? It should in my opinion as sender/receiver don't always have control on each-others' design.
As I said above, there are no problems if both Sender and receiver are in socket-system call mode.
/proc/dev/net in Pod1 (VF1 device name inside pod1 = netsxu )
[root@**cs-dpdk-sender*-1-64c7d64877-5ml7p bin]# cat /proc/net/dev
Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
**netsxu**: 1431238    4207    0    0    0     0          0      4207    32796     190    0    0    0     0       0          0
netf1u:  949856    2808    0    0    0     0          0      2808    36222     204    0    0    0     0       0          0
  eth0: 3017151   14452    0    0    0     0          0         0 20239378037 15505655    0    0    0     0       0          0
    lo: 7618450    3500    0    0    0     0          0         0  7618450    3500    0    0    0     0       0          0
nete1c: 1120380    5599    0    0    0     0          0      2775   211850    3039    0    0    0     0       0          0
neto1c: 1485613    4337    0    0    0     0          0      4265    36142     233    0    0    0     0       0          0

/proc/dev/net in Pod2 (Vf2 inside pod2 = netsxu, udp-socket bound to netsxu)

[root@**cs-udp-rx**-1-8466587c76-mfc8r bin]# cat /proc/net/dev
Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
**netsxu**:  950624    2820    0    0    0     0          0      2820    34674     197    0    0    0     0       0          0
netsim: 1968514    5816    0    0    0     0          0      5815    35166     200    0    0    0     0       0          0
  eth0:    5049      34    0    0    0     0          0         0     2936      30    0    0    0     0       0          0
    lo:     931      19    0    0    0     0          0         0      931      19    0    0    0     0       0          0
[root@cs-gnb-1-dltg-1-8466587c76-mfc8r bin]#

Assuming machine -2 is linux, please update the question with following information `cat /proc/net/dev`, `ethtool [nic port on machine2]`, — Vipin Varghese, Jun 07 '21 at 15:38
Thanks for responding Vipin, will update with the info shortly. — kvp, Jun 08 '21 at 05:21
Hi, I forgot to add an important info. Sender and receiver are running on 2 separate Kubernetes Pods. DPDK-Sender is running inside Pod_1. It uses a Mellanox VF, say VF_1. UDP-Receiver (socket system calls) is running in another Pod_2 and uses another Mellanox VF, say VF_2. Pod_1 and Pod_2 are on the *same* linux machine (Centos OS) VF_1 and VF_2 are created out of the same Mellanox PF device. Sorry for not including this critical info earlier. As I said above, there are no problems if both Sender and receiver are in socket-system call mode. — kvp, Jun 11 '21 at 12:47
I updated the question with /proc/net/dev from both sender and receiver (captured inside the container). — kvp, Jun 11 '21 at 13:03
thanks for the update @kvp, as per your update you re using Mellanox VF interfaces to send data between Pods. `is Data center bridging or mellanox embedded switch bridging is enabled?` can you confirm, without running pods do the data reach `VF-2`. In your current logs there is no `before and after` to compare. — Vipin Varghese, Jun 11 '21 at 14:08
Actually both my container applications are Duplex-traffic type. So using the same 2 VFs, traffic is working when both sender and receiver are in DPDK mode. Pod1 has VF1 : it uses Dpdk on VF1 to receive, Dpdk on VF1 to send Pod2 has VF2 : it uses Dpdk on VF2 to send, UDP socket on VF2 to send when both ends are DPDK, it works perfectly. When sender is using DPDK and receiver is listening on socket -> this does not work — kvp, Jun 11 '21 at 15:43
I am happy to make myself available on skype, google hangout, or zoom for call and debug. as certain statements in questions and comments contradicts. Please feel free to reach me once you are ready. — Vipin Varghese, Jun 12 '21 at 04:58
Thanks for offering @vipin, I need to check if my org would allow a live debug session - checking on this. But, I have a few questions on DPDK behavior - can u please help me understand https://stackoverflow.com/questions/62755041/dpdk-hw-offloaded-calculation-of-udp-checksum-not-working/62777421#62777421 Here u mention each mbuf about to be sent must use mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CKSUM | PKT_TX_UDP_CKSUM; Is this needed if the device is already opened with same offload settings? — kvp, Jun 12 '21 at 07:41
Another question is In the VF2 in my problem, I am setting up 1 rxQ and 1 txQ using RTE-ETH APIs. But since only sending uses DPDK, I dont do rte_eth_rx_burst() at all for VF2. So will the received packets (pkt having dstMac/dstIP matching VF2) be automatically sent to linux kernel? I need linux tcpip stack to handle rx-packets of this VF - only then the udp-socket-based-app can see the packets sent by VF1. Am I missing something here? — kvp, Jun 12 '21 at 07:44
as mentioned it is easier if you can come on live debug, there are many data that is not clear. can you come on skype, google hangout or zoom? if you have a generic question, can you please share code snippet and question? — Vipin Varghese, Jun 12 '21 at 07:48
the easiest way to debug is to reproduce the error with dpdk examples like `skeleton` we can use that for debug. I am available now — Vipin Varghese, Jun 12 '21 at 07:58
are able to reproduce the error with the skeleton app on MACHINE-1? Let me know if need to wait for you on skype , google hangout or zoom? — Vipin Varghese, Jun 12 '21 at 08:14
@VipinVarghese - as I said this is sw from my org and I am not allowed to share directly. I will prepare container images that can work with dpdk-skeleton example and then we can have a live debug session, Please allow me some time — kvp, Jun 12 '21 at 14:59
please note your question is really misleading, in the title you mention it is 2 different system, later edit you update it same machine using centos. you have missing ifnroamtion like `1) DPDK version, 2) pcpa dump of VF-1 TX and VF-2 RX3) request for before and after packet count for VF-1 and VF-2 3) Request for TCPDUMp when VF-2`. Answer to these does not involve `code snippet`. but sure will wait for you to reproduce with DPDK sample application. — Vipin Varghese, Jun 12 '21 at 16:45
Re-wrote the question for clarity. Could not follow what you mean by "pcpa dump". Will capture packet counts before & after sending traffic in my next run and update along with tcpdump in pod2 — kvp, Jun 13 '21 at 20:16
for the failure scenario `pod1-dpdk-pmd-tx-over-vf1 --> pod2-udp-socket-rx-bound-to-vf2 ==> FAILURE.` where VF-2 is in kernel driver, if the packet is received and drop at any TCP-IP layer the drop stats in `cat /proc/net/dev` for that interface should increment. With respect to `pcap dump`. please execute `tcpdump -eni [vf-2 name] udp -Q in` to capture all ingress packets. Once again I extend my offer to debug with sample application `skeleton`. Let me know asap — Vipin Varghese, Jun 14 '21 at 01:26

UDPSender in Pod1 uses DPDK rte_eth_tx_burst APIs, on the other Pod recv_from() socket system call cannot receive

0 Answers0