I'm testing CUDAfy with a small gravity simulation and after running a profiler on the code I see that most of the time is spent on the CopyFromDevice method of the GPU. Here's the code:
private void WithGPU(float dt)
{
this.myGpu.CopyToDevice(this.myBodies, this.myGpuBodies);
this.myGpu.Launch(1024, 1, "MoveBodies", -1, dt, this.myGpuBodies);
this.myGpu.CopyFromDevice(this.myGpuBodies, this.myBodies);
}
Just to clarify, this.myBodies is an array with 10,000 structs like the following:
[Cudafy(eCudafyType.Struct)]
[StructLayout(LayoutKind.Sequential)]
internal struct Body
{
public float Mass;
public Vector Position;
public Vector Speed;
}
And Vector is a struct with two floats X and Y.
According to my profiler the average timings for those three lines are 0.092, 0.192 and 222.873 ms. These timings where taken on a Windows 7 with a NVIDIA NVS 310.
Is there a way to improve the time of the CopyFromDevice() method?
Thank you