time measurement execution and allocation in gpu

Question

I am execution a parallel kernel on GPU using OpenCL and JOCL.

I want to know:

1/ Is there any functions to know the kernel size in term of work-items and work groups and how it is executed in my Nvidia GPU platform?

2/ is there a possibility to know the execution time of the kernel without GPU/CPU data transfers because? I used java tools System.currentTimeMillis(); before starting the kernel and after but it includes the data transfers time.

3/ more precisely is there any possibility to know the execution time of each GPU core?

There is a dedicated example showing how to obtain the execution time of kernels using events, at http://jocl.org/samples/JOCLEventSample.java — Marco13, Sep 20 '16 at 15:39

score 0 · Accepted Answer · answered Sep 19 '16 at 22:39

1) In kernel,

get_global_size(0) gives number of items in x dimension
get_global_size(1) gives number of item arrays in y dimension
get_global_size(2) gives number of item matrices in z dimension

total number is multiplication of them but if kernel is launched only 1-dim then only first function is enough.

get_local_size(0 or 1 or 2);

gives same thing for items in groups, not total items.

get_num_groups (0 or 1 or 2)

is similar but gives number of groups in total groups.

Number of dimensions are taken from

 int dims=get_work_dim ()

2) Event based performance queries from host code:

http://www.jocl.org/cloth/docs/doc-utils/org/jocl/utils/Events.html

computeExecutionTimeMs(org.jocl.cl_event event) Compute the execution time for the given event, in milliseconds.

1), 2) and 3) a profiler

can show all except "each core"(but gives info of "Lanes" which may not map to same core at all times but you can see what a single thread was doing) part. https://developer.nvidia.com/nvidia-nsight-visual-studio-edition visuals and tables give enough information about bottlenecks and kernel hotspots

Thank you, I have a question about the profiler, Can I use it with eclipse?? In the link they cited just Visual studio. — Nasima Info, Sep 23 '16 at 10:34
If that is working as "attached" to a process, then there is a chance to attach it to jvm or something related to jar executing process. Maybe somekind of command-line running from visual studio could work too. — huseyin tugrul buyukisik, Sep 23 '16 at 10:58

time measurement execution and allocation in gpu

1 Answers1