Suppose, I declare a local variable in a CUDA kernel function for each thread:
float f = ...; // some calculations here
Suppose also, that the declared variable was placed by a compiler to a local memory (which is the same as global one except it is visible for one thread only as far as I know). My question is will the access to f
be coalesced when reading it?