I have a multi-staged pipeline in a Halide::Generator
that is scheduled to run on a GPU. On my computer, I have two CUDA enabled GPUs, and I would like to know whether it's possible to run two instances of this generator in parallel (one on each of the GPUs), and return the two output buffers back to host for further processing.
If this is achievable, could you tell me how it's done, and whether the solution is scalable to a computer with an arbitrary number of GPUs?
Many thanks as always.
=== UPDATE ===
As @Zalman suggested, I've been trying to overwrite the halide_cuda_acquire/release_context
functions and use the void* user_context
pointer to select the appropriate contexts. To help me, I based myself on the test/generator/acquire_release_aottest.cpp
script. Although I found a bug in the script and fixed it, I can't figure out how user_context
can be used effectively.
All I've managed to do, was to create several cuda_ctx
s related to the several devices I have, and select a single cuda_ctx
within the halide_cuda_acquire_context
function which sets the specified GPU on which my generator will run.
So my question boils down to how/where should user_context
pointer be set?