Exploitation of GPU using Halide

Question

I'm implementing an algorithm using Halide while comparing hand-tuned(using CUDA) version of same algorithm. Acceleration of the Halide implementation mostly went well, but still slower a bit than hand-tuned version. So I tried to see exact execution time of each Func using nvvp(nvidia visual profiler). By doing that, I figured out that hand-tuned implementation overlaps multiple function's(they're similar) execution which is implemented as a Func in Halide implemetation. Cuda's Stream technology is used to do it.

I would like to know whether I can do similar exploitation of GPU in Halide or not.

I appreciate for reading.

score 1 · Answer 1 · answered Jun 06 '17 at 23:15

Currently the runtime has no support for CUDA streams. It might be possible to replace the runtime with something that can do this, but there is no extra information passed in to control the concurrency. (The runtime is somewhat designed to be replaceable, but there is a bit of a notion of a single queue and full dependency information is not passed down. It may be possible to reconstruct the dependencies from the inputs and outputs, but that starts to be a lot of work to solve a problem the compiler should be solving itself.)

We're talking about how to express such control in the schedule. One possibility is to use the support being prototyped in the async branch to do this, but we haven't totally figured out how to apply this to GPUs. (The basic idea is scheduling a Func async on a GPU would put it on a different stream. We'd need to use GPU synchronization APIs to handle producer/consumer dependencies.) Ultimately this is something we are interested in exploiting, but work needs to be done.

Exploitation of GPU using Halide

1 Answers1