Surface reference faster than Surface object

Question

I recently changed the surface reference of my algorithm for a surface object. Then, I noticed that the program runs slower.

Here is a comparison for simple example where I fill a 3D floating array [400*400*400] with a constant value.

Surface reference API

Time: 9.068928 ms

surface<void, cudaSurfaceType3D> s_volumeSurf;
...
surf3Dwrite(value, s_volumeSurf, px*sizeof(float), py, pz, cudaBoundaryModeTrap);

Surface object API

Time: 14.960256 ms

cudaSurfaceObject_t l_volSurfObj;
...
surf3Dwrite(value, l_volSurfObj, px*sizeof(float), py, pz, cudaBoundaryModeTrap);

This was tested on a GTX 680 with Compute Capability 3.0 and CUDA 5.0.

Does anyone have an explanation for this difference?

I did it with cudaEvent (cudaEventRecord, cudaEventSynchronize and cudaEventElapsedTime) — Arnaud, May 28 '13 at 12:26

score 7 · Accepted Answer · answered Jul 16 '13 at 06:20

7

In the surface object case, surface descriptors are fetched from global memory. In the surface reference case, these descriptors are compiled into constant memory. Fetching these descriptors may be much faster than global memory access. If your kernel is small enough or L1 cache is disabled, you could observe significant performance loss.

You can diff the SASS code to see the difference.

answered Jul 16 '13 at 06:20

longlee

86
5

So, would you say that it's generally recommended to stick with surface references instead? When would using a surface object be preferable? – Benjamin Bray Dec 25 '18 at 19:14

Surface reference faster than Surface object

Surface reference API

Surface object API

1 Answers1