1

I have a problem with executing RDMA atomic operations (FETCH_ADD and CMP_AND_SWAP). When I try to submit an atomic RDMA request, the ibv_post_send() function fails, with Errno set to "Invalid argument". I have no such problems with RDMA READ/WRITE.

I register the memory addresses as follows:

local_buffer = new uint64_t[1];   // so the memory region is byte-aligned
local_mr = ibv_reg_mr(pd, local_buffer, sizeof(uint64_t),
    IBV_ACCESS_LOCAL_WRITE
    | IBV_ACCESS_REMOTE_READ
    | IBV_ACCESS_REMOTE_ATOMIC));

I build the queue pairs as follows:

memset(qp_attr, 0, sizeof(*qp_attr));
qp_attr->send_cq = s_ctx->cq;
qp_attr->recv_cq = s_ctx->cq;
qp_attr->qp_type = IBV_QPT_RC;
qp_attr->cap.max_send_wr = 10;
qp_attr->cap.max_recv_wr = 10;
qp_attr->cap.max_send_sge = 1;
qp_attr->cap.max_recv_sge = 1;
TEST_NZ(rdma_create_qp(id, s_ctx->pd, qp_attr));

And finally submit the RDMA operation with atomic opcode as follows:

struct ibv_send_wr wr, *bad_wr = NULL;
struct ibv_sge sge;
memset(&sge, 0, sizeof(sge));
sge.addr        = (uintptr_t)conn->local_buffer;
sge.length      = 8;
sge.lkey        = conn->local_mr->lkey;
memset(&wr, 0, sizeof(wr));
wr.wr_id                    = 0;
wr.opcode                   = IBV_WR_ATOMIC_FETCH_AND_ADD;
wr.sg_list                  = &sge;
wr.num_sge                  = 1;
wr.send_flags               = IBV_SEND_SIGNALED;
wr.wr.atomic.remote_addr    = (uintptr_t)conn->peer_mr.addr;
wr.wr.atomic.rkey           = conn->peer_mr.rkey;
wr.wr.atomic.compare_add    = 1ULL; /* value to be added to the remote address content */
if (ibv_post_send(conn->qp, &wr, &bad_wr)) {
    fprintf(stderr, "Error, ibv_post_send() failed\n");
    die("");
}   

P.S. since I'm using librdmacm, the transition of the queue pairs between INIT and RTR and RTS are done automatically, so I cannot manually set qp_attr->qp_access_flags , qp_attr->max_rd_atomic and qp_attr->max_dest_rd_atomic using ibv_modify_qp(). However, I wrote a small code in libibcm with atomic operations, and set those attributes when transitioning the queue manually. Still, no luck.

narengi
  • 1,345
  • 3
  • 17
  • 38
  • What type of IB adapter are you using? Is your driver library libmlx4, libmlx5, ...? – Roland Dec 22 '14 at 19:16
  • @Roland running ibv_devinfo shows that vendor_part_id is 4113. Does this mean that my IB adapter is Connect IB? The installed driver library is MLNX_OFED_LINUX-2.3-1.0.0-ubuntu14.04-x86_64. I'm not sure if this is what you're looking for. – narengi Dec 22 '14 at 22:43
  • 1
    It's not exactly what I asked, but it's enough so that I know that you're using libmlx5. What version of libmlx5 do you have installed? Looking at the libmlx5 source, I see the only way mlx5_post_send() can return EINVAL is because on an invalid opcode. So are you sure that MLNX_OFED version supports atomic ops on ConnectIB? (You can check with your Mellanox support) – Roland Dec 23 '14 at 01:08
  • @Roland I will check with Mellanox, but I guess that atomic are supported, as I'm able to run Perftest benchmarking tool for atomic operations with no error. May I ask for a favor? Do you happen to have or know of a small self-contained code that uses atomic operations? That would probably make it much easier to spot the bug. – narengi Dec 23 '14 at 01:40
  • @Roland Its codebase is rather big. I looked into it but got lost. That's why I'm looking for something smaller – narengi Dec 23 '14 at 02:35

0 Answers0