Bank conflict in parallel reduction using interleaved addressing method

Question

I was reading the presentation on Optimizing Parallel Reduction in CUDA by Mark Harris. Here is a slide I have problem in:

It says there is bank conflict problem in this method. But why? All threads are accessing two consecutive memory cell which are in different banks. Neither of them accesses a specific memory cell concurrently.

talonmies · Accepted Answer · 2016-12-01T15:24:49.640

7

This presentation dates from the very early days of CUDA, and applies to first generation hardware.

That hardware had shared memory arranged in 8 32 bit banks. Because every eighth entry in the shared array resides in the same bank, there are bank conflicts at a number of levels of that reduction tree.

This problem was addressed in newer hardware, where the number of banks was expanded to 32, meaning that this sort of bank conflict cannot occur.

edited Dec 01 '16 at 15:24

answered Nov 21 '16 at 20:08

talonmies

70,661
34
192
269

1

Doesn't it still occur when the threads are all reading from addresses at multiples of 32 (* 4 bytes)? – mirgee May 24 '18 at 14:49
@mirgee: The solution is to use sequential addressing instead of interleaved addressing. So all the 8 writes will be into different banks and is totally conflict-free. See slide 14 of https://developer.download.nvidia.com/assets/cuda/files/reduction.pdf. – Mehrshad Zandigohar May 19 '21 at 00:02

Bank conflict in parallel reduction using interleaved addressing method

1 Answers1