I recently changed the surface reference of my algorithm for a surface object. Then, I noticed that the program runs slower.
Here is a comparison for simple example where I fill a 3D floating array [400*400*400] with a constant value.
Surface reference API
Time: 9.068928 ms
surface<void, cudaSurfaceType3D> s_volumeSurf;
...
surf3Dwrite(value, s_volumeSurf, px*sizeof(float), py, pz, cudaBoundaryModeTrap);
Surface object API
Time: 14.960256 ms
cudaSurfaceObject_t l_volSurfObj;
...
surf3Dwrite(value, l_volSurfObj, px*sizeof(float), py, pz, cudaBoundaryModeTrap);
This was tested on a GTX 680 with Compute Capability 3.0 and CUDA 5.0.
Does anyone have an explanation for this difference?