Hi I was using CAPS OpenACC compilers, but something strage happens when I try to get some preliminary profile results.
At first, I ran the code with declaring HMPPRT_LOG_LEVEL="info", which generates some profile results with time stamp.
[ 2.612337] ( 0) INFO : Upload edgelengths[0:129600] (element_size=8, queue=none, location=gravity_openacc.c:50)
[ 2.613485] ( 0) INFO : Call __hmpp_acc_region__2ha750yb (queue=none, location=gravity_openacc.c:50)
[ 2.614367] ( 0) INFO : Free edgelengths[0:129600] (element_size=8, queue=none, location=gravity_openacc.c:50)
So I guess the kernel execution time is calculated as 2.614367-2.613485=0.000882 s.
But when I declaring the CUDA_PROFILE=1, the below profile is shown
method=[ __hmpp_acc_region__2ha750yb_parallel_region_1 ] gputime=[ 492.480 ] cputime=[ 13.000 ] occupancy=[ 0.250 ]
So I'm quite confused about these two results, which is true???
Anyone get some solutions?
Thanks!