-1

On Nvivdia card i used to create a lot of queues, then enqueue the kernel in each queue and it made the kernels be executed parallel. It used to really speed up my program.
But now i use a Radeon card and this trick doesn't work anymore. I can see in profiler that before the device starts to execute a kernel, it waits for the previous kernel to end (even if the kernels are enqueued in different queues).
So the question is: how can i make a Radeon card execute command-queues parallel without sub-dividing the device into sub-devices.
Maybe i should use some custom driver?

ololo
  • 1

1 Answers1

1

It does sound like a driver issue but maybe your card can only handle one queue at a time. In that case, you can try to enqueue your kernels into a single concurrent dispatch queue. Look at the documentation for CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE for more information on this.

Jan-Gerd
  • 1,261
  • 8
  • 8