Is there any option to profile a CUDA kernel? Not as a whole, but rather part of it. I have some device functions invocation and I want to measure their times. Are there any flags/events/instructions that I can set and then it will be visible in NVIDIA Visual Profiler? Or do I need to do it manually by inserting cudaEventCreate
and similar functions.
Asked
Active
Viewed 1,300 times
1

Gilles Gouaillardet
- 8,193
- 11
- 24
- 30

user2390724
- 15
- 3
1 Answers
3
You can time specific parts of your kernel manually using the clock()
or clock64()
functions:
unsigned long long* time_spent;
__global__ void kernel(...)
{
unsigned int t1, t2;
// ...
t1 = clock();
// code of interest
t2 = clock();
atomicAdd(&time_spent, t2 - t1);
}
'clock()` officially returns a clock_t, but I prefer the explicit use of unsigned int to make obvious how the above code correctly handles wraparound of clock values (as long as the timed code does not take more than 2^32-1 cycles to complete.
Make sure to also time the code with
t1 = clock();
t2 = clock();
back-to-back so you can subtract the timing overhead.

tera
- 7,080
- 1
- 21
- 32
-
Is it not that `clock()` returns a local counter per SM? – KiaMorot May 30 '13 at 11:55
-
In addiction to this answer: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#time-function – KiaMorot May 30 '13 at 12:00
-
Yes it does. But that does not matter as long as we only take differences between times from the same SM. (Dynamic parallelism would indeed create a problem here, for simplicity I've just assumed that the code of interest does not launch any other kernels). – tera May 30 '13 at 14:44