Questions tagged [rdma]

RDMA refers to "Remote Direct Memory Access," which is a set of networking technologies typically used for high performance, low latency communication.

RDMA refers to "Remote Direct Memory Access," which is a set of networking technologies typically used for high performance, low latency communication.

RDMA networks have three main attributes:

  1. Asynchronous queueing: network operations are submitted by adding requests "work queues" which are asynchronously executed by hardware; when complete, the result is returned to software in a "completion queue."
  2. Kernel bypass: userspace processes can submit operations directly to the network adapter hardware without any system calls.
  3. One-sided operations: one system can read from or write into the memory of a remote system without involving any software on the remote system.
177 questions
0
votes
1 answer

RDMA without the Connection Manager: skipping rdma_create_event_channel()

Is there a way to implement RDMA RC functionality without invoking the Connection Manager, and therefore skipping the rdma_create_event_channel() ? Perhaps with a simple exchange of the necessary information through an external protocol, say UDP…
Jim
  • 135
  • 4
0
votes
2 answers

Setting max outstanding work requests to be put on a Send Queue of a Queue Pair in RDMA

I am trying to create a QueuePair with ibv_create_qp() and I have to describe the size of the Queue Pair by setting the fields of the struct ibv_qp_cap and providing it to the create function. My issue is with the max_send_wr field which corresponds…
kfertakis
  • 127
  • 1
  • 5
0
votes
1 answer

Does gRPC+MPI require RDMA?

Tensorflow allows for the options "gRPC", "gRPC+verbs" and "gRPC+mpi" when specifying a communication protocol. In the gRPC+verbs documentation, it clearly states that this protocol is based on RDMA. Meanwhile, in the gRPC+MPI documentation, it…
JRL
  • 31
  • 3
0
votes
1 answer

How to test the transferring rate of RDMA read option

I want to calculate the data transferring rate of RDMA. Using ConnectX-5 whose bandwidth is declared 200Gbps. I write data transferring code with RDMA read operation. I set start timestamp just before ibv_post_send(), and set end timestamp just…
HannanKan
  • 163
  • 2
  • 14
0
votes
1 answer

How to build perftest and run latency tests on RDMA

I'm trying to build the perftest C library to run some latency tests over the RDMA protocol. My steps: I downloaded the library from their GitHub and unzipped it on the box on which I want to run the server for the tests I…
Gabe
  • 5,997
  • 5
  • 46
  • 92
0
votes
0 answers

Force Accelio to use RDMA

We are using Accelio (https://github.com/accelio) underneath JXIO in order to send packages over the RDMA protocol. JXIO is Java API over AccelIO (C library). AccelIO (http://www.accelio.org/) is a high-performance asynchronous reliable messaging…
Gabe
  • 5,997
  • 5
  • 46
  • 92
0
votes
1 answer

Will the RDMA enabled NIC do endian conversion?

Is it possible to get an RDMA adapter (e.g. Mellanox NIC) to do an endian conversion during data transfer? Specifically, we're doing an RDMA transfer from a big-endian to a little-endian system and vice versa. Once data lands at the target, then…
B Abali
  • 433
  • 2
  • 10
0
votes
1 answer

RDMA: how do I know if a Read request is responded after polling the CQ?

I have N qps and I will send M RDMA Read requests through the send queue in each qp. The read request is sent by ibv_post_send() and the cq is polled iteratively using ibv_poll_cq(). The question is, if I get some work completions (WC) after calling…
Bloodmoon
  • 1,308
  • 1
  • 19
  • 34
0
votes
1 answer

When writing message length is more than 1024B(mtu), it failed in softroce mode

When I am writing message length is more than 1024B(mtu), it failed in softroce mode, pls help check why. Using the standard tool ib_write_lat to test: when ib_write_lat -s 1024 -n 5 When ib_write_lat -s 1025 -n 5, it fails. My softroce version in…
0
votes
2 answers

Infiniband transport Layer

I am having trouble understanding the context from http://www.redbooks.ibm.com/redbooks/pdfs/sg247351.pdf It is describing how the transport layer is implemented but every info I have read only explains the features not how the transport layer is…
Sungho Hong
  • 340
  • 2
  • 16
0
votes
1 answer

Tensorflow failed to use RDMA on Infiniband network

Executing: python tf_cnn_benchmarks.py \ --local_parameter_device=gpu \ --num_gpus=1 \ --batch_size=2 \ --model=alexnet \ --variable_update=distributed_replicated \ --job_name=ps \ --ps_hosts=192.168.230.107:50000 \ …
0
votes
0 answers

how does INFINIBAND gets the message from POSIX socket API?

I am trying to figure out how does the RDMA - INFINIBAND understands and sends the message by just using the traditional POSIX API. I looked over every documents and paper related to INFNIBAND but failed to answer my question. Would it be possible…
Sungho Hong
  • 340
  • 2
  • 16
0
votes
1 answer

RDMA - how does the ibv_post_send know the CQ?

I'm trying to learn RDMA. My question is, when doing ibv_post_send, I should get a completion event on a completion queue. But how does the ibv_post_send know to which completion queue? From what I see here, I could have created many.
Elad Weiss
  • 3,662
  • 3
  • 22
  • 50
0
votes
1 answer

what are the relation between CEE and RoCE,iWarp

i was trying to have a document about CEE,RoCE,iWarp, RDMA etc.. I searched and find CEE is short for Convergence Enhanced Ethernet, and it includes 802.1Qbb, 802.1Qau, 802.1Qaz standards. Can I look RoCE and iWarp as the realizations of CEE…
li7hui
  • 33
  • 5
0
votes
2 answers

How to test RDMA performance between two nodes with Intel's IMB benchmark

Hello I am new to using RDMA benchmark tools and currently trying to use Intel's IMB benchmark https://software.intel.com/en-us/articles/intel-mpi-benchmarks and currently, tested all the examples in the document. However, I have no idea how to…
Sungho Hong
  • 340
  • 2
  • 16