In the worst case, does this sample allocate testCnt * xArray.Length storage in the GPU global memory? How to make sure just one copy of the array is transferred to the device? The GpuManaged attribute seems to serve this purpose but it doesn't solve our unexpected memory consumption.
void Worker(int ix, byte[] array)
{
// process array - only read access
}
void Run()
{
var xArray = new byte[100];
var testCnt = 10;
Gpu.Default.For(0, testCnt, ix => Worker(ix, xArray));
}
EDIT
The main question in a more precise form: Does each worker thread get a fresh copy of xArray or is there only one copy of xArray for all threads?