I was reading the presentation on Optimizing Parallel Reduction in CUDA by Mark Harris. Here is a slide I have problem in:
It says there is bank conflict problem in this method. But why? All threads are accessing two consecutive memory cell which are in different banks. Neither of them accesses a specific memory cell concurrently.