I have:
- Host memory that has been successfully pinned and mapped using
cudaHostAlloc(..., cudaHostAllocMapped)
orcudaHostRegister(..., cudaHostRegisterMapped)
; - Device pointers have been obtained using
cudaHostGetDevicePointer(...)
.
I initiate cudaMemcpy(..., cudaMemcpyDeviceToDevice)
on src and dest device pointers that point to two different regions of pinned+mapped memory obtained by the technique above.
Everything works fine.
Question: should I continue doing this or just use a traditional CPU-style memcpy()
since everything is in system memory anyway? ...or are they the same (i.e. does cudaMemcpy
map to a straight memcpy
when both src and dest are pinned)?
(I am still using the cudaMemcpy
method because previously everything was in device global memory, but have since switched to pinned memory due to gmem size constraints)