cudaMallocPitch for managed memory

Question

I just realized that CUDA only offers cudaMallocManaged to allocate managed memory. But what should I do when I need to allocate a 2D or 3D array which should be done by cudaMallocPitch for better coalescing? There are no managed memory pendants of these pitch allocations.

There is no analog to `cudaMallocPitch` using Unfied/Managed Memory, currently. It is possible to manufacture your own pitched allocation using `cudaMallocManaged` and some basic knowledge of GPU requirements for coalescing. You would simply round your data array width up to the next reasonable allocation size (perhaps 128 bytes) to facilitate perfect coalescing from start-of-row, and allocate based on that. Your pitch would then be your actual allocated width. Since `cudaMemcpy`-type operations are more-or-less eliminated using UM, this would affect both host and device access to said data. — Robert Crovella, Apr 17 '15 at 16:54
That's exactly what I'm doing currently. Is there a specific reasen why these functions do not exist? — Michael, Apr 17 '15 at 16:58
Pitched allocations are typically not used on the host side, and on the device side they may serve specific alignment purposes for certain specific uses (texturing, surfaces, etc.), but apart from that, the general reason for a pitched allocation is for performance. Specialized memory types like cudaArray, textures etc. are not currently supported by UM, so that leaves us with the performance question. — Robert Crovella, Apr 21 '15 at 23:18
The tradeoff of a possible performance increase on the device side for additional atypical complexity (and perhaps reduction in performance) on the host side (since, as mentioned above, a managed pitched allocation would impose pitch behavior on both the host and device side) suggests to me that there may not be much motivation in providing these functions in a UM environment. GPU caches, in place since cc 2.0 devices appeared, have to some extent mitigated the performance impact of not using pitched allocations on the GPU. — Robert Crovella, Apr 21 '15 at 23:19
Pitched memory is crucial for texturing, but also has performance issues for memory coalescing. Also, DMA transfers can be really slowed down with small pitches when using cudaMemcpy2D. — wcochran, Nov 16 '17 at 21:24
You can't do texturing from managed memory, so that seems irrelevant for the question here. — Robert Crovella, Oct 06 '21 at 14:14

cudaMallocPitch for managed memory

0 Answers0