0

For measuring an OpenCL kernel execution time we either uses a:

1- CPU Timers .. but we need to consider that the OCL functions are non-blocking hence we need to use the clFinish() routine for achieving full throughput.

2- GPU Timers .. that is using clGetEventProfilingInfo() routine along with setting the CL_QUEUE_PROFILING_ENABLE flag in properties argument of either clCreateCommandQueue() or clSetCommandQueueProperty()routines.

How can the Operating System and the Driver version effect the accuracy of the timers used to measure the kernel execution time ?

All that I know is that we need to warm-up the device with at least one kernel call to absorb the latency of the OpenCL resource allocation at the very beginning.

mmain
  • 333
  • 3
  • 19

1 Answers1

0

1- You will not get accurate timings if you only use CPU timing due to non-blocking kernel launch, the time you spend on the driver and it may even differ due to context switches from OS perspectives. 2- GPU timers depend on GPU hardware counters. Using the events to read the counters will give you the most accurate timings you can get. Since CPU or the OS do not meddle with GPU hardware counters, the effect will be none. The only case that may affect is the driver in the case of how hardware counters are handled.

The warming-up part is for data-transfers and memory allocation so it does not affect how hardware counters behave.

parallel highway
  • 354
  • 2
  • 12