0

I have a multi-staged pipeline in a Halide::Generator that is scheduled to run on a GPU. On my computer, I have two CUDA enabled GPUs, and I would like to know whether it's possible to run two instances of this generator in parallel (one on each of the GPUs), and return the two output buffers back to host for further processing.

If this is achievable, could you tell me how it's done, and whether the solution is scalable to a computer with an arbitrary number of GPUs?

Many thanks as always.

=== UPDATE ===

As @Zalman suggested, I've been trying to overwrite the halide_cuda_acquire/release_context functions and use the void* user_context pointer to select the appropriate contexts. To help me, I based myself on the test/generator/acquire_release_aottest.cpp script. Although I found a bug in the script and fixed it, I can't figure out how user_context can be used effectively.

All I've managed to do, was to create several cuda_ctxs related to the several devices I have, and select a single cuda_ctx within the halide_cuda_acquire_context function which sets the specified GPU on which my generator will run.

So my question boils down to how/where should user_context pointer be set?

zanbri
  • 5,958
  • 2
  • 31
  • 41
  • 1
    user_context is passed in to the filter call itself. If ```Target::UserContext``` is in the target compilation flags, ```user_context``` in the string form of the target, then the first argument to the AOT filter call is the user context. The value passed will be passed to any ```halide_cuda_acquire_context``` call on behalf of that filter invocation. One can also pass a user context parameter when doing calls to JITted code, but doing so requires a ```ParamMap``` to get different values per thread. – Zalman Stern Aug 24 '18 at 18:28

1 Answers1

1

Likely the best way to do this is to define your own version of halide_cuda_acquire_context and halide_cuda_release_context that use the user_context parameter to figure out which CUcontext to use. That way one can make a context on whichever GPU one wants a given kernel to run and then pass in a user_context that points to that context.

This may run into issues if trying to run the same kernel on multiple contexts due the kernel not getting compiled on the second context. I think I fixed that but if not, I will.

Zalman Stern
  • 3,161
  • 12
  • 18
  • I've updated my question following your answer. Thank you. – zanbri Aug 23 '18 at 17:35
  • Your comment help a lot -- thank you! I did run into an error I can't seem to debug (possibly the same issue you mentioned might come up). I opened a new issue -- https://github.com/halide/Halide/issues/3242 -- with the description, as well as a link to code, should you want to test it. Many thanks again. – zanbri Aug 27 '18 at 12:37