0

While running ib_read_bw test for 64K message sizes from Mellanox CX-4 (request initiator) to another RNIC, re-transmissions are happening from Mellanox for the 5th RDMA-READ on-wards for 50KB of data (first 12KBs has been ACKed successfully), after which it continuously re-transmitting the same request for remaining 50KB data, though the target RNIC is responding.

One observation the target RNIC is responding with a MSN of 11 instead of 5 int the first RDMA READ response, for the re-transmitted (for 50KB) read request.

The infiniband spec says, for duplicate requests RNIC should not increment the MSN, does this mean that, the RNIC should respond with whatever MSN it has (it may have responded for all the incoming requests received and having a MSN of 16 and then re-transmission being seen) or should it respond with proper MSN for the re-transmitted RDMA READ.

Anji M
  • 11
  • 2

2 Answers2

0

The InfiniBand spec says that:

For RDMA READ requests, the responder may increment its MSN after it has completed validating the request and before it has begun transmitting any of the requested data, and may return the incremented MSN in the AETH of the first response packet.

and

The MSN shall not be incremented for duplicate requests.

(C9-148)

I believe this means the MSN should remain unchanged when retransmission occurs.

haggai_e
  • 4,689
  • 1
  • 24
  • 37
  • Thanks for the comment. My understanding is also the same. But the question is what MSN should the responder include in the replies to a retransmitted packet.? For example, in my case, it's 5th pkt which is getting re-tranmitted, so should it have 5 as MSN or should it have the latest MSN that it has (may be someother packets (RDMA SENDs/WRITEs) might have got an ACK before re tranmissions)? The spec does say "MSN shouldn't be changed", but it's not clearly saying what the duplicate response should have ? – Anji M Feb 20 '20 at 04:37
  • As far as I understand it should be the latest, but I agree it is not clear. – haggai_e Feb 20 '20 at 12:39
0

Yes, as per my understanding the MSN should be pointing to the original read request. In case of responding to a duplicate SEND or WRITE, the PSN and MSN, both are of the last ACK sent. This works as a coalesced ACK.
But while responding to the Read request, the PSN is of the original read request and hence the MSN should also be of the original read request.

From Spec - "to be considered a duplicate RDMA READ Request, the PSN of the duplicate request must be within the responder's current duplicate PSN region. Furthermore, to be considered a valid duplicate RDMA READ Request, the PSN of the duplicate request must fall within the range of PSNs allocated to the original RDMA READ Response, and the amount of data requested in the duplicate request must be entirely contained within the extent of data requested in the original RDMA READ Request. In other words, the data requested in the duplicate RDMA READ Request must be a proper subset of the data requested in the original RDMA READ Request. If the starting PSN and length of a duplicate RDMA READ Request does not fall within the range of PSNs allocated to the original RDMA READ Response, the request is invalid and the responder may silently drop the duplicate RDMA READ Request "