0

The function ibv_get_cq_event() block and never return even after destruction all resources.

I do init all InfiniBand resources, launch ibv_get_cq_event in separate thread, then destroy all InfiniBand resources but ibv_get_cq_event never returns.

What is the proper initialization and destruction IB (RDMA)?

Daniil
  • 143
  • 3
  • 9

1 Answers1

1

Blocking in ibv_get_cq_event() and then trying to destroy your resources is roughly the equivalent of creating a socket, blocking in read() and then calling close() on the socket in a different thread. In fact internally a completion channel is really just a file descriptor, and ibv_get_cq_event() is pretty much just a wrapper around read(). In both cases, the read() holds a reference on the file and the kernel won't wake up the read() just because someone else called close().

There are at least two reasonable ways to handle your situation:

  • Before trying to clean up IB resources, send a signal to the thread blocked in read() to wake it up.

  • use fcntl to set O_NONBLOCK on comp_channel->fd and then use an event loop with poll() or epoll to known when the completion channel is readable. Only call read() when an event is there, and stop your event loop when tearing down the RDMA resources.

Roland
  • 6,227
  • 23
  • 29