OOO (out of order) queues are available to meet the needs of user event dependency. Having a single queue in this type of applications can lead to a blocked queue waiting to a user event that never comes. And creating one queue per job is also non optimal.
If you want parallelism int the execution, OOO is NOT what you need. But multiple queues.
A common approach is to use a Queue for IO, and a queue for running kernels.
But you can also use a queue per thread, in a multi-thread processing scheme. IO of each thread will overlap the execution of other threads.
NOTE: nVIDIA does support parallel execution of jobs in a single queue, but that is out of the standard.