6

I was reading the presentation on Optimizing Parallel Reduction in CUDA by Mark Harris. Here is a slide I have problem in:

enter image description here

It says there is bank conflict problem in this method. But why? All threads are accessing two consecutive memory cell which are in different banks. Neither of them accesses a specific memory cell concurrently.

Majid Azimi
  • 5,575
  • 13
  • 64
  • 113

1 Answers1

7

This presentation dates from the very early days of CUDA, and applies to first generation hardware.

That hardware had shared memory arranged in 8 32 bit banks. Because every eighth entry in the shared array resides in the same bank, there are bank conflicts at a number of levels of that reduction tree.

This problem was addressed in newer hardware, where the number of banks was expanded to 32, meaning that this sort of bank conflict cannot occur.

talonmies
  • 70,661
  • 34
  • 192
  • 269
  • 1
    Doesn't it still occur when the threads are all reading from addresses at multiples of 32 (* 4 bytes)? – mirgee May 24 '18 at 14:49
  • @mirgee: The solution is to use sequential addressing instead of interleaved addressing. So all the 8 writes will be into different banks and is totally conflict-free. See slide 14 of https://developer.download.nvidia.com/assets/cuda/files/reduction.pdf. – Mehrshad Zandigohar May 19 '21 at 00:02