Can I call dma_map_single() on DeviceB using an addresses returned from dma_alloc_coherent on DeviceA?

Question

I am writing custom linux driver that needs to DMA memory around between multiple PCIE devices. I have the following situation:

I'm using dma_alloc_coherent to allocate memory for DeviceA
I then use DeviceA to fill the memory buffer.

Everything is fine so far but at this point I would like to DMA the memory to DeviceB and I'm not sure the proper way of doing it.

For now I am calling dma_map_single for DeviceB using the address returned from dma_alloc_coherent called on DeviceA. This seems to work fine in x86_64 but it feels like I'm breaking the rules because:

dma_map_single is supposed to be called with memory allocated from kmalloc ("and friends"). Is it problem being called with an address returned from another device's dma_alloc_coherent call?
If #1 is "ok", then I'm still not sure if it is necessary to call the dma_sync_* functions which are needed for dma_map_single memory. Since the memory was originally allocated from dma_alloc_coherent, it should be uncached memory so I believe the answer is "dma_sync_* calls are not necessary", but I am not sure.

I'm worried that I'm just getting lucky having this work and a future kernel update will break me since it is unclear if I'm following the API rules correctly. My code eventually will have to run on ARM and PPC too, so I need to make sure I'm doing things in a platform independent manner instead of getting by with some x86_64 architecture hack.

I'm using this as a reference: https://www.kernel.org/doc/html/latest/core-api/dma-api.html

There is a [Buffer Sharing and Synchronization](https://www.kernel.org/doc/html/v4.16/driver-api/dma-buf.html) section about "[...]sharing buffers for hardware (DMA) access across multiple device drivers and subsystems, and for synchronizing asynchronous hardware access" in the [The Linux driver implementer’s API guide](https://www.kernel.org/doc/html/v4.16/driver-api/index.html) which might be a useful resource. — Pixelchemist, Jul 26 '21 at 22:42
You are correct, using returned address for `dma_map_single()` smells like a hack. Since you are working with PCIe devices you should be able to use p2p DMA transfers. — 0andriy, Jul 27 '21 at 07:11
@0andriy this is not P2P DMA. P2P DMA is where devices directly communicate with DMA, i.e. one device performs read/write operations against the BAR space of another device. System memory is not involved at all with P2P DMA. What OP is describing is a shared buffer in system memory that's accessed via DMA from two different devices. — alex.forencich, Aug 05 '22 at 00:15

score 0 · Answer 1 · edited Jul 27 '21 at 13:49

0

dma_alloc_coherent() acts similarly to __get_free_pages() but as size granularity rather page, so no issue I would guess here.
First call dma_mapping_error() after dma_map_single() for any platform specific issue. dma_sync_*() helpers are used by streaming DMA operation to keep device and CPU in sync. At minimum dma_sync_single_for_cpu() is required as device modified buffers access state need to be sync before CPU use it.

edited Jul 27 '21 at 13:49

0andriy

4,183
1
24
37

answered Jul 27 '21 at 07:36

tej parkash

137
2

@tej why do you think dma_sync_* functions are necessary in this case? The memory was allocated with dma_alloc_coherent(), which is uncached memory by definition. The dma_sync_* functions are for the different architectures to sync their caches, so I don't believe they would have any effect on uncached memory. – Vern Jul 27 '21 at 14:12
According to Linux documentation for dma_alloc_coherent() function "Consistent memory is memory for which a write by either the device or the processor can immediately be read by the processor or device without having to worry about caching effects. (You may however need to make sure to flush the processor's write buffers before telling devices to read that memory.)" So we kind of need to use dma_sync_*() for uncached memory. – tej parkash Jul 28 '21 at 10:13
Couldn't an appropriate memory barrier (i.e. wmb() in linux kernel or __sync_synchronize() gcc built in) be used to satisfy the "flushing the processor's write buffers"? I didn't put this detail in the question but this memory will be mmap'd to userspace where I won't have access to the dma_sync_* functions. I don't want the extra overhead of an IOCTL for my kernel module to perform the dma_sync_*. – Vern Jul 29 '21 at 14:44
Thanks everyone for the comments. It doesn't seem like there are any huge red flags for my scheme, but I'm definitely on the fringe of how the API was intended to be used. From all my testing so far, this seems to be working. I just hope future kernels don't break it. – Vern Jul 29 '21 at 14:50

Can I call dma_map_single() on DeviceB using an addresses returned from dma_alloc_coherent on DeviceA?

1 Answers1