0

Hi I was using CAPS OpenACC compilers, but something strage happens when I try to get some preliminary profile results.

At first, I ran the code with declaring HMPPRT_LOG_LEVEL="info", which generates some profile results with time stamp.

[     2.612337] ( 0) INFO : Upload   edgelengths[0:129600] (element_size=8, queue=none, location=gravity_openacc.c:50)
[     2.613485] ( 0) INFO : Call     __hmpp_acc_region__2ha750yb (queue=none, location=gravity_openacc.c:50)
[     2.614367] ( 0) INFO : Free     edgelengths[0:129600] (element_size=8, queue=none, location=gravity_openacc.c:50)

So I guess the kernel execution time is calculated as 2.614367-2.613485=0.000882 s.

But when I declaring the CUDA_PROFILE=1, the below profile is shown

method=[ __hmpp_acc_region__2ha750yb_parallel_region_1 ] gputime=[ 492.480 ] cputime=[ 13.000 ] occupancy=[ 0.250 ] 

So I'm quite confused about these two results, which is true???

Anyone get some solutions?

Thanks!

1 Answers1

0

The CUDA profiler shows you just the time it takes to execute the CUDA kernel, while the log you obtain with HMPPRT_LOG_LEVEL="info" gives you the overall time it takes to execute the region, which is not exactly the same thing, because you may have some code that is executed on the host for example.

user2054656
  • 151
  • 3