On Nvivdia card i used to create a lot of queues, then enqueue the kernel in each queue and it made the kernels be executed parallel. It used to really speed up my program.
But now i use a Radeon card and this trick doesn't work anymore. I can see in profiler that before the device starts to execute a kernel, it waits for the previous kernel to end (even if the kernels are enqueued in different queues).
So the question is: how can i make a Radeon card execute command-queues parallel without sub-dividing the device into sub-devices.
Maybe i should use some custom driver?
Asked
Active
Viewed 68 times
-1

ololo
- 1
-
3Post some of your dispatch code. This shouldn't be happening. – 3Dave Nov 22 '19 at 19:03
1 Answers
1
It does sound like a driver issue but maybe your card can only handle one queue at a time. In that case, you can try to enqueue your kernels into a single concurrent dispatch queue. Look at the documentation for CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE for more information on this.

Jan-Gerd
- 1,261
- 8
- 8