0

Short version:

I have two main questions:

  • Are DMA writes allowed to pass DMA reads? If so, is there any way to stop them from doing so (e.g. by setting a flag, etc.)?

  • Do DMA reads respect any byte-ordering? e.g. left-to-right, right-to-left, or is it non-deterministic? Is there any way to enforce a left-to-right ordering, like what DMA writes have?

Long version:

I am not really familiar with the terminology of this subject, so please excuse any mistakes.

The case I have is as follows:

An array of numbers is stored in the main memory, and the network card is issuing a sequence of DMA reads (to the entire array) and DMA writes (to increment a cell of the array). Let's say the array is initially set to zero, like this:

| index |  value |
------------------
|   A   |    0   |
|   B   |    0   |

And the network DMAs are the only ways that this array is read or written to (i.e. no CPU involvement).

Let's say we have this schedule:

R, R, W(A), W(B)

I was curious to know if it's possible that the result of one of the reads is {0, 1}, and the result of the other read is {1, 0}.

I wrote a simple program to check this and it turned out that this scenario is possible indeed. I am puzzled as to what could be the reason.

I am not sure at all, but my guess is that this is caused due to two reasons: transaction ordering in PCIe (the PCIe specification in section 2.4.1 says that writes can pass reads, but not vice versa (I didn't understand rule B2b, though), and that DMA reads do not follow any deterministic byte ordering when accessing their data (for this I couldn't find any reliable resource).

For example, this might be what happened under the hood: The first read starts from the beginning of the array, and the second read starts from the end of the array. The writes take place, and then the first read scans the second index, and the second read scans the first index.

narengi
  • 1,345
  • 3
  • 17
  • 38
  • It would help considerably if you have some code here that demonstrates what you're trying to do in a concrete form that people can examine. If you need things to happen in a particular order you probably need to engage some kind of lock semantics in order to ensure that happens. Any time you have buffering you have race conditions if you're not super careful. – tadman May 01 '17 at 21:39
  • @tadman unfortunately, it is not possible in this case. My program uses InfiniBand library to send remote DMA requests to a remote machine. Any working code will be quite large in size and most of it will be irrelevant to the question. Besides, not too many people have access to a NIC with RDMA feature. – narengi May 01 '17 at 21:44
  • It sounds like you're asking for race conditions if that's the case. I don't think DMA in general has any rules, and the PCIe specification, such as it is, may not necessarily apply to the fullest extent due to other components being involved. You're going to need to confirm your writes before doing any reads, and even then you may need to find a way to do atomic writes to ensure they're all flushed before you start accessing other data. – tadman May 01 '17 at 21:47
  • @tadman I see. In the schedule that I wrote, the problem is actually not the writes going uncommitted, but it's the reads being done non-atomically and accessing the memory in a non-deterministic way. Does this mean that the NIC driver has to be modified such that it waits until it gets the completion for a DMA read before moving on to the next write? – narengi May 02 '17 at 00:44
  • I can only speak in terms of theory here. In practice you're going to have to aggressively test this code to determine the exact characteristics of the system you're using. Hopefully the way it behaves is predictable, or at least understandable. – tadman May 02 '17 at 00:55
  • I can advise you to read this very detailed book: [PCI Express Technology 3.0](https://www.mindshare.com/Books/Titles/PCI_Express_Technology_3.0). – Paebbels May 02 '17 at 06:59

0 Answers0