CUDA: Differences between HtoD and DtoH bandwidth

Question

Yet another bandwidth related question. I expected the plots of Device-to-host bandwidth and that of Host-to-Device to be similar, but I see that there is a significant difference between the two. Considering both following the same route, so the effective bandwidth should be the same, isn't it? The testbed consists of total 12 Intel Westmere CPUs on two sockets, 4 Tesla C2050 GPUs with 4 PCIe Gen2 Express slots. Using the bandwidthtest program from NVidia code samples. enter image description here

What are the overheads of doing a cudamemCpy from the host vs the device?

Interesting question. I seem to get similar results for my M2050 and the opposite result for my S1070. The results are very similar - as are yours - but I, too, wonder where the discrepancy comes in. — Patrick87, Aug 11 '11 at 20:17

score 5 · Accepted Answer · answered Aug 11 '11 at 23:49

First, I would say those two curves are similar. I can honestly say that I've never seen symmetric PCI-e bandwidth on any system I have used -- and that includes both CUDA and graphics (OpenGL/D3D) tests, so I don't think it's something (especially this small difference) that should concern you.

As with your other PCI-e bandwidth question, the answer is similar -- the driver may use different strategies for different types and sizes of transfers, attempting to get the highest throughput possible.

Actual throughput depends on many factors, including the type of GPU, and especially on the host chipset in use.

Thank you for your answers....I always see HtoD bandwidth being lesser than DtoH. — Sayan, Aug 12 '11 at 01:15

CUDA: Differences between HtoD and DtoH bandwidth

1 Answers1