Why disabling IRQ on linux causes rdma_read and rdma_write to fail?

Question

I have two host machines connected by Mellanox infiniband HCA. I'm executing a simple RDMA application to perform RDMA write and RDMA read operation from one machine (client) on the other machine (server). To know which interrupts are related to HCA cards on each machine, I ran the following command:

  less proc/interrupts

  67:   475880  50253       0       0   PCI-MSI-edge    mlx4-async@pci:0000:01:00.0
  68:   399002      0       73      0   PCI-MSI-edge    mlx4_0-0
  69:       0   3264        23      0   PCI-MSI-edge    mlx4_0-1
  70:       0       0       0       0   PCI-MSI-edge    mlx4_0-2
  71:       0       0       0       0   PCI-MSI-edge    mlx4_0-3

On the server machine, I've experimented that using the function __disable_irq() on those 4 interrupts causes all RDMA read/write operations performed by the client to fail with the error message "transport retry counter exceeded".

My question is why and when RDMA read/write operations can generate irqs on the remote machine? I thought that if they don't involve the remote CPU, then they will not perform any kind of IRQ.

Then, why disabling those interrupts causes these operations to fail?

Does your RDMA application use librdmacm or just libibverbs? (Or something else?) — haggai_e, Sep 24 '15 at 06:50
@haggai_e, the RDMA application uses `libibverbs`. thank for you time ! — Fopa Léon Constantin, Sep 24 '15 at 07:31
It's difficult to tell what goes wrong with your experiment, but I suppose there are many other applications that may use interrupts. Perhaps the opensm SM isn't able to communicate with the client system? — haggai_e, Sep 24 '15 at 19:30

score 1 · Answer 1 · edited Sep 23 '15 at 11:01

1

Not all transactions are RDMA transactions.

Furthermore, when you're writing to another machine's memory, you need interrupts to notice when the write has finished (so that you know when you can reuse your own memory), and to notify the other machine that new data has shown up in its memory.

edited Sep 23 '15 at 11:01

Fopa Léon Constantin

11,863
8
48
82

answered Sep 23 '15 at 10:10

CL.

173,858
17
217
259

Thank you for your answer. According to (http://www-01.ibm.com/support/knowledgecenter/ssw_aix_71/com.ibm.aix.rdma/rdma_write_or_rdma_write_with_immediate.htm and http://www-01.ibm.com/support/knowledgecenter/ssw_aix_71/com.ibm.aix.rdma/rdma_write_or_rdma_write_with_immediate.htm) and the RDMA RFC (https://www.ietf.org/rfc/rfc5040.txt) page 13, **no notifications** (IRQ ?) are send to the remote host during a RDMA read/write. That's why I'm asking why and when are IRQ involve in those operations. – Fopa Léon Constantin Sep 23 '15 at 11:00

Why disabling IRQ on linux causes rdma_read and rdma_write to fail?

1 Answers1