Parallel execution of several OpenCL kernels on Radeon card

Question

On Nvivdia card i used to create a lot of queues, then enqueue the kernel in each queue and it made the kernels be executed parallel. It used to really speed up my program.
But now i use a Radeon card and this trick doesn't work anymore. I can see in profiler that before the device starts to execute a kernel, it waits for the previous kernel to end (even if the kernels are enqueued in different queues).
So the question is: how can i make a Radeon card execute command-queues parallel without sub-dividing the device into sub-devices.
Maybe i should use some custom driver?

Post some of your dispatch code. This shouldn't be happening. — 3Dave, Nov 22 '19 at 19:03

score 1 · Answer 1 · answered Nov 22 '19 at 21:15

It does sound like a driver issue but maybe your card can only handle one queue at a time. In that case, you can try to enqueue your kernels into a single concurrent dispatch queue. Look at the documentation for CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE for more information on this.

Parallel execution of several OpenCL kernels on Radeon card

1 Answers1