-1

This is a follow-up question to this one.

Suppose I have a CUDA kernel

template<unsigned ThreadsPerWarp>
___global__ foo(bar_t* a, const baz_t* b);

and I'm implementing a specialization of it for the case of ThreadsPerWarp being 32 (this circumvents the valid criticism of Talonmies' answer to my previous question.)

In the body of this function (or of other __device__ functions called from it) - should I prefer using the constant value of ThreadsPerWarp? Or is it better to use warpSize? Or - will it be all the same to the compiler in terms of the PTX it generates?

Community
  • 1
  • 1
einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • Using `ThreadsPerWarp` should be better from an optimization point of view. – Robert Crovella Feb 21 '17 at 15:35
  • @RobertCrovella: Always and with no exceptions? That is, nvcc doesn't have optimizers which "notice" `warpSize` more clearly than they do `32`? After all, talonmies said in the linked-to question that the generated PTX doesn't assume a warp size of 32. – einpoklum Feb 21 '17 at 16:55
  • 1
    In [the answer you linked](http://stackoverflow.com/q/36047035/1593077) from @talonmies, he stated "At the same time, using `warpSize` in the code prevents optimization, since formally it is not a compile-time known constant." I'm pretty much just reiterating that (I agree with it, obviously). I guess you are now asking for proof that something doesn't exist - harder to do. – Robert Crovella Feb 21 '17 at 17:21
  • @RobertCrovella: I actually was thinking maybe something like that _does_ exist, otherwise talonmies would not be advocating for `warpSize` that strongly. But you've essentially answered my question with your comments. – einpoklum Feb 21 '17 at 17:30

1 Answers1

0

No, don't use warpSize.

It seems that other than potential future-proof'ness (which in practice is questionable), there is no advantages in using it. Instead, you can very well use something like:

enum : unsigned { warp_size = 32 };
einpoklum
  • 118,144
  • 57
  • 340
  • 684