3

I would like to upload two images to the GPU memory, and I'm interested how fast I can do this?

In fact - will it be faster to compare two bitmaps in RAM with CPU, or upload them to GPU and use GPU parallelism to do it?

Johan Kotlinski
  • 25,185
  • 9
  • 78
  • 101
Daniel Mošmondor
  • 19,718
  • 12
  • 58
  • 99

3 Answers3

3

If you run the CUDA device bandwidth sample, you'll get a benchmark for the upload speed.

Assuming DDR3 tri-channel 1600MHz RAM, you'll get something like 38 GB/s memory bandwidth.

Take a typical midrange card like a GTX460 and you'll get something like 84 GB/s memory bandwidth. Note that you'll have to make a hop across the bus which is something like 8GB/s theoretical, ~5.5 in practice for a PCI-E2.0 x16 link.

Note that kotlinski's answer isn't quite correct. You'll can do compared in parallel and then do a parallel reduction in which case, the bigger GPU device bandwidth can work win out eventually.

I think the answer is likely to be: a loss to upload to GPU and do comparison once. Possible gain if comparison is made multiple times (kept and modified on the GPU, for example).

Edit:

The multiple times comparison refers to if you modified the images on the GPU memory in situ. Thus, it would merit another comparison (caching doesn't cut it), while not incurring the penalty of another copy across the bus.

peakxu
  • 6,667
  • 1
  • 28
  • 27
1

Since memory access is the bottleneck here, it is extremely likely that it is faster to just do it in CPU. Making it run in parallel is not likely to give you anything, memory access is essentially a serial operation.

Johan Kotlinski
  • 25,185
  • 9
  • 78
  • 101
0

The answer to this question is highly debatable and depends entirely on you systems configuration. This means that you'll have to do the benchmarks yourself. Factors that could influence your situation:

  1. Speed of your RAM
  2. Speed of the GPU Bus
  3. Whether or not you have shared RAM between GPU & CPU

However, I do think that in the general case (eg. with busspeeds in the order of GB/s) it's faster to upload the images to the GPU and do the difference comparison there.

Jasper Bekkers
  • 6,711
  • 32
  • 46
  • I don't know how you can conclude your last sentence from just the bus speeds. CPUs typically have more compute than bandwidth, and the CPU-GPU typically has less (or equal if shared) bandwidth. So how exactly does moving all the data to the GPU help ? – Bahbar May 11 '11 at 15:04
  • @Bahbar I was thinking of the actual parallelism on the GPU side while processing but TBH this entire question is extremely system dependent and can easily go both ways. Eg, core count and a SIMD implementation could definitely help in favor of the CPU. The real question of course is how much data does he need to process to begin with. – Jasper Bekkers May 11 '11 at 15:18
  • GPU being faster, including the transfer from CPU would mean that the CPU does not have enough compute to effectively use its available bandwidth (since the CPU-GPU b/w is smaller than available CPU one, and we're talking about a single difference per memory item. That's critical to the point). I'm saying no system has those characteristics (really, why put so much b/w to the CPU if it never can compute on all of it?) – Bahbar May 11 '11 at 16:36