0

indirectJ2[MAX_SUPER_SIZE] is a shared array.

My cuda device kernel contains following statement (executed by all threads in the thread block):

int nnz_col = indirectJ2[MAX_SUPER_SIZE - 1];

I suspect this would cause bank conflicts.

Is there any way I can implement the above Thread-block level broadcast efficiently using new shuffle instructions for the kepler GPUs? I understand how it works at warp level. Other solutions, which are beyond shuffle instruction (for instance use of CUB etc.), are also welcome.

arbitUser1401
  • 575
  • 2
  • 8
  • 25

1 Answers1

2

There is no bank conflict for that line of code on K40. Shared memory accesses already offer a broadcast mechanism. Quoting from the programming guide

"A shared memory request for a warp does not generate a bank conflict between two threads that access any sub-word within the same 32-bit word or within two 32-bit words whose indices i and j are in the same 64-word aligned segment (i.e., a segment whose first index is a multiple of 64) and such that j=i+32 (even though the addresses of the two sub-words fall in the same bank): In that case, for read accesses, the 32-bit words are broadcast to the requesting threads "

There is no such concept as shared memory bank conflicts at the threadblock level. Bank conflicts only pertain to the access pattern generated by the shared memory request emanating from a single warp, for a single instruction in that warp.

If you like, you can write a simple test kernel and use profiler metrics (e.g. shared_replay_overhead) to test for shared memory bank conflicts.

Warp shuffle mechanisms do not extend beyond a single warp; therefore there is no short shuffle-only sequence that can broadcast a single quantity to multiple warps in a threadblock. Shared memory can be used to provide direct access to a single quantity to all threads in a warp; you are already doing that.

global memory, __constant__ memory, and kernel parameters can also all be used to "broadcast" the same value to all threads in a threadblock.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257