I am Seeing a very high L3 cache misses on AMD while running DPDK based forwarding/routing applications. My application consists of an Pkt Poll Thread (say P1) and two Worker Threads W1 and W2. P1 polls the nic and sprays packets to W1 or W2. The Worker does fixed packet jobs and send it back to P1 for transmit back. on AMD 7702 i am not able to cross 22Mpps and on AMD 7542 its just 15Mpps. Compare this to Intel Xeon 6248R, for the same application, i can get ~40Mpps. The NiC here mellanox ConnectX-5 dual port 100Gbps.
Also i am seeing L3 cache misses even if i simply drop packets at Rx itself. So we do packet rx and immediately free all of them in the Pkt Poll thread. Even then i am noticing very high L3 cache misses. This is in the same thread by the way. I have even tried running testpmd but dont get beyond 35Mpps. The numbers in this doc look quite overwhelming in comparison (although the hardware is different).
https://fast.dpdk.org/doc/perf/DPDK_19_08_Mellanox_NIC_AMD_performance_report.pdf