If I have a __constant__
value
__constant__ float constVal;
Which may or may not be initialized by MPI ranks on non-blocking streams:
cudaMemcpyToSymbolAsync((void*)&constVal,deviceValue,sizeof(float),0,cudaMemcpyDeviceToDevice,stream);
Is this:
- Safe to be accessed by multiple MPI ranks simultaneously within kernels? I.e. do ranks share the same instance of
val
or do MPI semantics (they all have a private copy) still hold? - If the above is safe, is it safe to be initialized by multiple MPI ranks?