1

I'm trying to implement my own 64-bit shuffle function in CUDA. However, if I do it like this:

static __inline__ __device__ double __shfl_xor(double var, int laneMask, int width=warpSize)
{
    int hi, lo;
    asm volatile( "mov.b64 { %0, %1 }, %2;" : "=r"(lo), "=r"(hi) : "d"(var) );
    hi = __shfl_xor( hi, laneMask, width );
    lo = __shfl_xor( lo, laneMask, width );
    return __hiloint2double( hi, lo );
}

All subsequent calls to __shfl_xor will be instantiated from this 64-bit version, no matter what the type of the argument is. For example, if I am doing

int a;
a = __shfl_xor( a, 16 );

It would still use the double version. A work-around might be using different function names. But since I'm calling this shuffle function from a template function, using different names means that I have to make a different version for 64-bit floating points, which is not quite neat.

So how can I overload the __shfl_xor(double,...) function while on the same time still make sure the __shfl_xor(int,...) can be called appropriately?

Rainn
  • 315
  • 1
  • 9

1 Answers1

2

All integral types and float can be upcasted to double. When given a choice between in-built function and your specialized double function, the compiler here might be picking yours for all types.

Have you tried creating a function with a different name and using that to create both your specialized double variant and as dummies for the other types?

For example:

static __inline__ __device__ double foo_shfl_xor(double var, int laneMask, int width=warpSize)
{
    // Your double shuffle implementation
}

static __inline__ __device__ int foo_shfl_xor(int var, int laneMask, int width=warpSize)
{
    // For every non-double data type you use
    // Just call the original shuffle function
    return __shfl_xor(var, laneMask, width);
}

// Your code that uses shuffle
double d;
int a;
foo_shfl_xor(d, ...); // Calls your custom shuffle
foo_shfl_xor(a, ...); // Calls default shuffle
Ashwin Nanjappa
  • 76,204
  • 83
  • 211
  • 292