Which RX queue will be bound to a specific CPU core?

Question

I want to assign RX queues to cores mapping as 1:1. And I use the mlx5 nic. I want to make some different changes to the RX queue of each core. So I want to know the mapping between the index of RX queues and CPU cores.
I have noted that there is a function shown below in driver/net/mlx/mlx5_rxq.c(DPDK 18.05).

struct mlx5_rxq_ctrl *
mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
         unsigned int socket, const struct rte_eth_rxconf *conf,
         struct rte_mempool *mp)

The function will create a DPDK RX queue, and every RX queue will have a index. I want to know the mapping between the index of RX queues and the logical number of CPU cores. For example, will the RX queue 1 be mapped to the core 1? Will the RX queue be mapped to the same index CPU core? And Is the mapping fixed or can it be changed?

can you please rephrase or clarify the question? Mellanox nic CX-5 and CX-6 supports more than 1 queue (multiple queue). One can run queue to core mapping as 1:1 or n:1. `Best performance achieved is 3 queues to 1core for 64B with vector PMD mode`. For this one need not modify any queued code in the driver for DPDK PMD but `simply modify the queue to core mapping in application`. Or is your question about `dynamic queue which is enabled by differed queue start`? — Vipin Varghese, Aug 29 '22 at 00:22
I am sorry for my expression. I have modified my complete questions. What I wonder is `the mapping between the index of RX queue and the logical number of CPU core. Is this mapping fixed or can it be changed`? — xuxing chen, Aug 29 '22 at 03:31
I wonder if there is any reference for `Best performance achieved is 3 queues to 1core for 64B with vector PMD mode`. — xuxing chen, Sep 03 '22 at 02:24
yes there are references and internal benchmark done for the same — Vipin Varghese, Sep 04 '22 at 06:36
it is there is amd tuning dpdk tuning guide (in and developer forum) and internal tests done on AMD platform. are you using AMD EPYC platform? — Vipin Varghese, Sep 04 '22 at 07:54
you can get similar results if the platform is Intel icelake. up to 90-95 Mpps you can achieve by 3 cores (running at 3.5Ghz) with each core having 3 RX queues (total 8 RX queues). for achieving 148Mpps you will end up using 12 cores with 16 RX queues on AMD MILAN running 3.5Ghz. On Intel you might need an extra core that is 13 cores since all core turbo might not sustain 3.5Ghz. — Vipin Varghese, Sep 04 '22 at 12:34
You can refer to https://stackoverflow.com/questions/72345569/why-does-dpdk-mellanox-connectx5-process-128b-packets-much-faster-than-other-s and https://stackoverflow.com/questions/66711987/peculiar-behaviour-with-mellanox-connectx-5-and-dpdk-in-rxonly-mode/72959976#72959976 — Vipin Varghese, Sep 04 '22 at 12:45

score 0 · Answer 1 · answered Aug 30 '22 at 14:50

There are no static mappings between logical queues and logical cores in DPDK. Once you allocate an RX or TX queue, it's OK to use it at any logical core by calling rte_eth_rx_burst/rte_eth_rx_burst with the queue's index. However, these two functions (and most functions related to packet I/O) are not thread-safe, so it's the programmer's responsibility to prevent race conditions (e.g., two cores cannot call RX functions on the same queue concurrently).

Generally, DPDK programs allocate queues to lcores freely by giving each lcore unique queue indexes, and DPDK doesn't care how queues are allocated to cores. You may refer to DPDK's l3fwd example (especially its lcore_conf struct) https://elixir.bootlin.com/dpdk/latest/source/examples/l3fwd/l3fwd.h#L81 for more details.

score 0 · Answer 2 · answered Oct 10 '22 at 13:27

As per the clarification shared over the comment, the answer to the question the mapping between the index of RX queue and the logical number of CPU core. Is this mapping fixed or can it be changed?

[Answer] DPDK rx_burst and tx_burst is not multithreaded safe for same port and queue. But parallel execution are safe guarded by unique queue-id. Which means at a given point of execution no multiple threads should use the same queue and port.

It depends upon the application logic where one decides which lcore needs to be used for which port-queue pair. for example

struct portQueuepair
{
  uint16_t port;
  uint16_t queue;
}

/* global scope */
struct portQueuepair portqueue_Map[256];


func init logic()
{
  /* set default mapping to main-lcore */
  static mapIndex = 0;
  for (I =0; I < rte_eth_dev_count(); i++)
  {
     get port info via  rte_eth_dev_info_get
     ensure num rx queues == num tx queues
     if (info.nb_rx_queues != info.nb_tx_queues)
       return -1;

     for (j = 0; j < info.nb_rx_queues; j++)
     {
        portqueue_Map[mapIndex].port = i;
        portqueue_Map[mapIndex].queue = j;

        mapIndex += 1;
     }
  }
}

During program run, one can use either

rte_eth_dev_rx_queue_stop and rte_eth_dev_tx_queue_stop for a specific port-queue or
rte_eth_dev_stop to stop entire port and queues

then remap to desired lcore, then restart via rte_eth_dev_start or rte_eth_dev_rx_queue_start & rte_eth_dev_tx_queue_start for port-queue pairs.

With respect to performance, reference numbers check StackOverflow queries 1 & 2 for running multiple queues on single core.

Which RX queue will be bound to a specific CPU core?

2 Answers2