I am trying to figure out what a profile result means, before I start to optimize. I am very new with CUDA and profiling in general and I am confused by the result.
Specifically, I want to know what is happening during seemingly unoccupied chunks of computation. When I look from top to bottom at the CPU and GPU there appears to be nothing happening during large portions of the code. These look like columns with nothing in Thread1
and nothing in GeForce
. Is this normal? Whats happening here?
The run was done a multicore machine under no load with nvprof. The GPU code was compiled with -arch=sm_20 -m32 -g -G
for CUDA 5.