Delay of incoming network package on Linux - How to analyse?

Question

The problem is: Sometimes tcpdump sees that the receiving of a UDP packet is held back until the next incoming UDP packet, although the network tap device shows it goes without delay through the cable.

Scenary: My profinet stack on Linux (located in user space) has a cyclic connection where it receives and sends Profinet protocol packets every 4ms (via raw sockets). About every 30 ms it also receives UDP packets in another thread on a UDP socket and replies them immediately, according to that protocol. It's around 10% CPU load. Sometimes it seems such received UDP packets are stuck in the network driver. After 2 seconds the next UDP packet comes in and both, the missed UDP packet and that next one is received. There are no dropped packets.

My measuring:

I use tcpdump -i eth0 --time-stamp-precision=nano --time-stamp-type=adapter_unsynced -w /tmp/tcpdump.pcap to record the UDP traffic to a RAM disk file.
At the same time I use a network tap device to record the traffic.

Question:

How to find out where the delay comes from (or is it a known effect)? (2. What does the timestamp (which tcpdump sets to each packet) tell me? I mean, which OSI layer refers it to, in other words: When is it taken?)

Topology: "embedded device with Linux and eth0" <---> tap-device <---> PLC. The program "tcpdump" is running on the embedded device. The tap device is listening on the cable. The actual Profinet connection is between PLC and embedded device. A PC is connected on the tap device to record what it is listening to.

Wireshark (via tap and tcpdump): see here (packet no. 3189 in tcpdump.pcap)

Seems like you're describing a "rotting packet" problem, which has been known to exist in some Ethernet device drivers implementing NAPI while transitioning between poll to interrupt mode. See https://stackoverflow.com/questions/23574203/regarding-napi-implementation-in-linux-kernel and https://elixir.bootlin.com/linux/v2.6.16.11/source/Documentation/networking/NAPI_HOWTO.txt#L632 — sawdust, Aug 27 '21 at 08:03
@sawdust During that time (when the probably "rotten UDP packet" comes via cable and 2 seconds later when the next UDP packet arrives and resolves it) a lot of Profinet packets are read from a raw socket in a parallel thread. So I guess somehow as if the Ethernet driver actually is doing its job without sleeping too much (and missing things). — falkb, Aug 27 '21 at 08:31
Additional info: Recording tcpdump with --time-stamp-precision=nano still shows a 2 seconds delay for the problem UDP packet (that's the time it takes until the next UDP packet comes in) — falkb, Aug 27 '21 at 08:39
What kernel version is this? What driver and device? *"During that time ... a lot of Profinet packets are read"* -- Don't rely on userspace activity to guess when the driver is handling interrupts! In fact, that's probably the packet burst that triggers the problem. Besides, EMAC hardware can't distinguish between UDP and Profinet frames. A rotting packet occurs because RX becomes idle after a packet burst. Try pinging that host to see if that can affect the 2 sec delay. From my understanding, Profinet is built on UDP, so it's unlikely you are seeing a UDP stack issue. — sawdust, Aug 27 '21 at 09:20
The kernel is "5.4.61-rt37-lf-5.4.y+g20de6d5a36a4" and "dmesg|grep net" outputs: "001: audit: initializing netlink subsys (disabled) , 003: libphy: fec_enet_mii_bus: probed , 000: fec 30be0000.ethernet eth0: registered PHC device 0 , 000: fsl_dpa: FSL DPAA Ethernet driver , 000: tsn generic netlink module v1 init... , 001: Atheros 8031 ethernet 30be0000.ethernet-1:00: attached PHY driver [Atheros 8031 ethernet] (mii_bus:phy_addr=30be0000.ethernet-1:00, irq=POLL)" — falkb, Aug 27 '21 at 09:27
@sawdust: If you want to, you can have a look at both recordings, of the TAP and of the Linux tcpdump: https://www.dropbox.com/s/cbifszjw89xs464/recordings.zip?dl=0 It's the packet "Request" No. 9888 in DUT--Tap--S71200__tap.pcapng and No. 2683 in DUT--Tap--S71200___tcpdump.pcap. You can see it comes in on Linux 2 seconds too late. A ping packet resolves the receiving after 2 seconds. — falkb, Aug 27 '21 at 09:47
@sawdust: The packet timestamp is 2s too late, and it's a hardware timestamp, that's why I also think the rotten packet is a very lowlevel problem in the network stack. But I wonder why meanwhile the network stack is processing the other Profinet packets as if nothing is wrong. Profinet packets are not TCP or UDP, they have an own type. — falkb, Aug 27 '21 at 13:04
I added another wireshark recording to my question. Interestingly I want also to mention "ifconfig eth0" reports dropped RX packets to me. — falkb, Aug 31 '21 at 10:19
@stackinside here it is: https://www.dropbox.com/s/5mcb7d84wblj6wn/ethtool-output.txt?dl=0 — falkb, Aug 31 '21 at 15:08
@stackinside "ethtool -K eth0 gro off" doesn't help, and here's the output of "ethtool -c eth0", everything is 0 except these ones: rx-usecs: 1000 rx-frames: 200 tx-usecs: 1000 tx-frames: 200 — falkb, Sep 03 '21 at 07:35
@stackinside: sorry, your latest ethtool options didn't help either. I noticed the driver is the "FSL DPAA Ethernet driver" (dpaa_eth.c) of Freescale which has prettily changed since the version I use of 2 years ago (yocto Linux dunfell). — falkb, Sep 06 '21 at 07:25

score 0 · Answer 1 · answered Nov 24 '21 at 12:06

It was a bug in the freescale Fast Ethernet Driver (fec_main.c) which NXP has fixed by its awesome support now.

The actual answer (for the question "How to find out where the delay comes from?") is: One has to build a Linux with kernel tracing on, patch the driver code with kernel tracing and then analyse such tracing with the developer Linux tool trace-cmd. It's a very complicated thing but I'm very happy it is fixed now:

trace-cmd record -o /tmp/trace.dat -p function -l fec_enet_interrupt -l fec_enet_rx_napi -e 'fec:fec_rx_tp' tcpdump -i eth0 --time-stamp-precision=nano --time-stamp-type=adapter_unsynced -w /tmp/tcpdump.pcap

Delay of incoming network package on Linux - How to analyse?

1 Answers1