I am a bit confused about the thread communication and synchronization mechanisms provided by CUDA. Can someone help me to verify my assumptions below ?
Threads within a warp communicate using shared or global memory and synchronize using implicit synchronization
Warps within a thread block communicate using shared or global memory and synchronize using barrier synchronization
Thread blocks within a given grid or kernel communicate using global memory and synchronize using atomic memory operations
Thread blocks from different grids or kernels communicate using global memory and synchronize using implicit synchronization