I'm using the CUDA 7.0 profiler, nvprof
, to profile some process making CUDA calls:
$ nvprof -o out.nvprof /path/to/my/app
Later, I generate two traces: the 'API trace' (what happens on the host CPU, e.g. CUDA runtime calls and ranges you mark) and the 'GPU trace' (kernel executions, memsets, H2Ds, D2Hs and so on):
$ nvprof -i out.nvprof --print-api-trace --csv 2>&1 | tail -n +2 > api-trace.csv
$ nvprof -i out.nvprof --print-gpu-trace --csv 2>&1 | tail -n +2 > gpu-trace.csv
Every record in each of the traces has a timestamp (or a start and end time). The thing is, time value 0 in these two traces is not the same: The GPU trace time-0 point seems to signify when the first operation on the GPU triggered by the relevant process begins to execute, while the API trace's time-0 point seems to be the beginning of process execution, or sometime thereabouts.
I've also noticed that when I use nvvp
and import out.nvprof
, the values are corrected, that it to say, the start time of the first GPU op is not 0, but something more realistic.
How do I obtain the correct offset between the two traces?