How to avoid getting stuck in rdma_get_recv_comp() or __ibv_get_cq_event()?

Question

Fellow RDMA hackers, does anyone know if rdma_get_recv_comp(), which calls __ibv_get_cq_event() ever time out?

My problem is with the same programs as shown here: RDMA program randomly hangs

It works fine, but it's not robust against random client disconnects. Specifically, if I forcefully kill the client, then the server gets stuck in rdma_get_recv_comp() / ipv_get_cq_event().

This is for a Mellanox ConnectX-3 and I checked that the default timeout is 2.14s and retries = 1. But I'm not clear if ibv_get_cq_event() in blocking mode will even time out. The explanation of timeout in the ibv_modify_qp() documentation seems to suggest timeouts only apply for sends (rdma_get_send_comp()) since only senders wait for ACKs. But I don't see any difficulty in allowing receives to have a timeout too.

If hanging inside rdma_get_recv_comp() is expected in this case, how can I avoid it or implement a time out?

Some possibilities:

change my client shutdown sequence so that it performs all the necessary sends so that it won't leave rdma_get_recv_comp() on the server hanging?
replace rdma_get_recv_comp() with a loop that polls for receive completions

score 2 · Accepted Answer · answered Jul 01 '16 at 18:04

ibv_get_cq_event() does not time out. It waits for completion events (which are generated when a work request completes and generates a completion queue entry). If no event is generated, say because your receive never completes, then you will wait forever. If the QP (connection) transitions to the error state, then all the receives posted will complete with a flush status — but if you destroy the QP before polling all the completions, then they will be removed from the CQ.

So your problem may be that when the client disconnects, the other side doesn't necessarily detect the disconnection — for example if the client just reboots, then the RDMA CM won't disconnect cleanly and if the server side doesn't have any sends in flight, it won't notice the disconnect. You can deal with this with some sort of keepalive — 0 byte RDMA WRITEs work well for this, since they are NOPs but will fail if anything goes wrong with the connection.

Or it may be that your server is too eager to destroy the QP when it gets a disconnect notification from the RDMA CM. You want to have a reference count on your connection structure so that you wait everything you're going to wait for before you destroy the QP.

Finally, it is possible to use ibv_get_cq_event() in a non-blocking way. The manpage has an example of using poll() on the underlying completion channel file descriptor to wait for events with a timeout.

How to avoid getting stuck in rdma_get_recv_comp() or __ibv_get_cq_event()?

1 Answers1