Is out of order command queue useful on AMD GPU?

Question

It seems to me that one opencl command queue won't dispatch commands to more than one hardware queue. So commands in an out of order command queue are still executed one by one, just not in the order they were enqueued?

So if I want to make use of multiple hardware queues all I can do is to create multiple opencl command queues?

I tried on a hd7870 and it maxed out performance at just 2 instances(per gpu) which are separate contexts with their own queues instead of single context with multiple ooo cqs. So the single context with multiple ooo cqs would be better such as 3 or 4 cqs at the same time. (synched with explicit meeting point). But they say its better on Nvidia. — huseyin tugrul buyukisik, Sep 06 '15 at 08:56
I did a lot of tests on this long time ago. Even if the GPU usage hits 100%, the real performance does not improve with different contexts. I guess is just the way they do the measurements, they count as well the time spent in context switching as valid %. Having a single context is as fast as multiple contexts, even if the usage does not hit 100% in the first case. — DarkZeros, Sep 07 '15 at 10:25

score 2 · Accepted Answer · answered Sep 07 '15 at 10:22

OOO (out of order) queues are available to meet the needs of user event dependency. Having a single queue in this type of applications can lead to a blocked queue waiting to a user event that never comes. And creating one queue per job is also non optimal.

If you want parallelism int the execution, OOO is NOT what you need. But multiple queues.

A common approach is to use a Queue for IO, and a queue for running kernels. But you can also use a queue per thread, in a multi-thread processing scheme. IO of each thread will overlap the execution of other threads.

NOTE: nVIDIA does support parallel execution of jobs in a single queue, but that is out of the standard.

Is out of order command queue useful on AMD GPU?

1 Answers1