0

Client Side:

ib_poll_cq(cq,1,&wc){
    if(wc.status == IB_WC_SUCCESS) 
        printk("Successful\n");
    else
        printk("Failure: %d\n", wc.status);
}

Server Side:

do {
   num_comp = ibv_poll_cq(s_ctx.recv_cq, 1, &wc);
} while (num_comp == 0);

The client side is written in kernel space and server side is written in user Space. The wc.status returns 12. What could be possible issues with this?

S. Salman
  • 590
  • 1
  • 6
  • 22

1 Answers1

3

A value of 12 in wc.status means the retry exceeded error has occurred. This means the node that saw the error (let's call it local) has tried sending or performing an RDMA operation and did not get a response from the other node. This can happen if the remote QP wasn't set up correctly to be in RTR state with its parameters matching the local QP's parameters.

You can find some details about the various ibv_wc codes in this blog post.

haggai_e
  • 4,689
  • 1
  • 24
  • 37
  • The ib_poll_cq is setting the we.status to 12 and ibv_poll_cq is setting wc to a junk value (e.g 432882). – S. Salman Jul 17 '16 at 09:30
  • What was the returned value from ibv_poll_cq? – haggai_e Jul 17 '16 at 10:10
  • it always returns 0. but wc.status returns junk value. – S. Salman Jul 17 '16 at 11:43
  • Right. `ibv_poll_cq` returns the number of completions that were copied to its output parameter, or a negative error code. If it returned zero, it means no completions have arrived since the last call, and the fields of `wc` are left unmodified by the call. – haggai_e Jul 28 '16 at 06:12