I have some CUDA kernels I want to run in individual pthreads.
I basically have to have each pthread execute, say, 3 cuda kernels, and they must be executed sequentially.
I thought I would try to pass each pthread a reference to a stream, and so each of those 3 cuda kernels would all execute sequentially, in the same stream.
I could get this working with a different context for pthread, which would then execute the kernels as normal, but that seems to take a lot of overhead.
So how do I make each pthread work in the same context, concurrently with the other pthreads?
Thanks