When I am trying to capture stream execution to build CUDA graph, call to thrust::reduce
causes a runtime error cudaErrorStreamCaptureUnsupported: operation not permitted when stream is capturing
. I have tried returning the reduction result to both host and device variables, and I am calling reduction in a proper stream by the means of thrust::cuda::par.on(stream)
. Is there any way I can add thrust
functions execution to CUDA graphs?
Asked
Active
Viewed 1,018 times
0
1 Answers
1
Thrust's reduction operation is a blocking operation on the host side. I am assuming that you are using the result of reduction as a parameter to one of your following kernels. So that when you are capturing a CUDA graph, it cannot instantiate the graph executable because you are dependent on a variable that is on the host side but not available until the reduction kernel finishes execution. As a solution, you can try adding a host node to your graph that returns the result of the reduction.

heapoverflow
- 264
- 2
- 12
-
I also found in some older answers that `reduce` only returns to host; however, in thrust documentation reduce is declared as `__host__ __device__ T thrust::reduce` -- did things change and it can return to the device now? – Cos_ma Apr 03 '20 at 04:52
-
1If you call it from a device or global function, it will return to device. But if you are invoking the function from the host side, the return value is in the host memory. So that, you cannot store the result of the reduction operation that is called from host in a device variable. – heapoverflow Apr 03 '20 at 10:24