When I target 32-wide warp CUDA architectures, should I use warpSize?

Question

This is a follow-up question to this one.

Suppose I have a CUDA kernel

template<unsigned ThreadsPerWarp>
___global__ foo(bar_t* a, const baz_t* b);

and I'm implementing a specialization of it for the case of ThreadsPerWarp being 32 (this circumvents the valid criticism of Talonmies' answer to my previous question.)

In the body of this function (or of other __device__ functions called from it) - should I prefer using the constant value of ThreadsPerWarp? Or is it better to use warpSize? Or - will it be all the same to the compiler in terms of the PTX it generates?

Using `ThreadsPerWarp` should be better from an optimization point of view. — Robert Crovella, Feb 21 '17 at 15:35
@RobertCrovella: Always and with no exceptions? That is, nvcc doesn't have optimizers which "notice" `warpSize` more clearly than they do `32`? After all, talonmies said in the linked-to question that the generated PTX doesn't assume a warp size of 32. — einpoklum, Feb 21 '17 at 16:55
In [the answer you linked](http://stackoverflow.com/q/36047035/1593077) from @talonmies, he stated "At the same time, using `warpSize` in the code prevents optimization, since formally it is not a compile-time known constant." I'm pretty much just reiterating that (I agree with it, obviously). I guess you are now asking for proof that something doesn't exist - harder to do. — Robert Crovella, Feb 21 '17 at 17:21
@RobertCrovella: I actually was thinking maybe something like that _does_ exist, otherwise talonmies would not be advocating for `warpSize` that strongly. But you've essentially answered my question with your comments. — einpoklum, Feb 21 '17 at 17:30

score 0 · Accepted Answer · answered Mar 11 '17 at 20:42

0

No, don't use warpSize.

It seems that other than potential future-proof'ness (which in practice is questionable), there is no advantages in using it. Instead, you can very well use something like:

enum : unsigned { warp_size = 32 };

answered Mar 11 '17 at 20:42

einpoklum

118,144
57
340
684

When I target 32-wide warp CUDA architectures, should I use warpSize?

1 Answers1