Context
I'm porting a complex CUDA application to SYCL which uses multiple cudaStream
to launch the kernels. In addition, it also uses the default Stream in some cases, forcing a device-wide synchronization.
Problem
Cuda Streams can be mapped quite easily to in order SYCL Queues, however when encountering a device-wide syncronization point (i.e. cudaDeviceSyncronize()
), I must explicitly wait on all the queues as queue::wait()
waits just on the commands submitted to that queue.
Question
Is there a way to wait on all the commands for a specific device, without having to explicitly call wait() on every queue?