OpenCL Kernel with image transfer running slow

Question

I have a pretty simple OpenCL kernel, basically doing nothing more than defining its input:

__kernel void test(__read_only image3d_t d_multitest){}

My host side code is basic pyopencl to transfer an image to my device and run the kernel:

import pyopencl as cl
import numpy as np

platform = cl.get_platforms()[0]
devs = platform.get_devices()
device1 = devs[1]
h_test = np.zeros((64,512,512)).astype(np.float32,order='F')
mf = cl.mem_flags
ctx = cl.Context([device1])
Queue1 = cl.CommandQueue(ctx,properties=cl.command_queue_properties.PROFILING_ENABLE)
Queue2 = cl.CommandQueue(ctx,properties=cl.command_queue_properties.PROFILING_ENABLE)
event_list=[]

fi = open('Minimal.cl', 'r')
fstr = "".join(fi.readlines())
prg = cl.Program(ctx, fstr).build()
knl = prg.test

d_test = cl.Image(ctx,mf.READ_ONLY, cl.ImageFormat(cl.channel_order.INTENSITY, cl.channel_type.FLOAT), h_test.shape)
e1 = cl.enqueue_copy(Queue1, d_test, h_test, is_blocking = False, origin = (0,0,0), region = h_test.shape)
knl.set_args(d_test)
cl.enqueue_nd_range_kernel(Queue2,knl,(512,512,64),None,wait_for=[e1,])

I am profiling this code on different devices and see, that the transfer time basically scales with the memory bandwith of the device, which is expected. On the other hand, my Kernel execution time varies wildly.

On Nvidia the Kernel execution duration is <<1ms.

however, the exact same Kernel takes >20ms on AMDs implementation.

My question is, if this kind of overhead is tolerable, or if I am doing something basically wrong.

Hardware:

NVIDIA GeForce GTX TITAN X

AMD ATI Radeon 9 290X

Host: Ubuntu 16.04

you should add at least a simple addition and get its result to host to check if nvidia's compiler is optimizing it out — huseyin tugrul buyukisik, Oct 07 '16 at 15:05
How are you profiling this code, using host-side timers, or by using OpenCL Events? — Dithermaster, Oct 08 '16 at 00:45
I am profiling the events returned from pyopencl's functions (in my example e1 etc.) — Dschoni, Oct 10 '16 at 15:56

OpenCL Kernel with image transfer running slow

0 Answers0