type casting to unsigned long long in CUDA?

Question

Basically what I want is an function works like hiloint2uint64(), just join two 32 bit integer and reinterpret the outcome as an uint64.

I cannot find any function in CUDA that can do this, anyhow, is there any ptx code that can do that kind of type casting?

score 3 · Answer 1 · answered Aug 19 '13 at 11:52

3

You can define your own function like this:

__host__ __device__ unsigned long long int hiloint2uint64(int h, int l)
{
    int combined[] = { h, l };

    return *reinterpret_cast<unsigned long long int*>(combined);
}

answered Aug 19 '13 at 11:52

sgarizvi

16,623
9
64
98

1

C++ reinterpret_cast is the correct way to handle this type of case. – talonmies Aug 19 '13 at 13:21

score 0 · Answer 2 · answered Dec 10 '21 at 22:49

Maybe a bit late by now, but probably the safest way to do this is to do it "manually" with bit-shifts and or:

uint32_t ui_h = h;
uint32_t ui_l = l;
return (uint64_t(h)<<32)|(uint64_t(l));

Note the other solution presented in the other answer isn't safe, because the array of ints might not be 8-byte aligned (and shifting some bits is faster than memory read/write, anyway)

Sebastian · Answer 3 · 2021-12-11T07:42:49.080

Use uint2 (but define the temporary variable as 64-bit value: unsigned long long int) instead of arrays to be sure of alignment. Be careful about the order of l and h.

__host__ __device__ __forceinline__ unsigned long long int hiloint2uint64(unsigned int h, unsigned int l)
{
    unsigned long long int result;
    uint2& src = *reinterpret_cast<uint2*>(&result);
    src.x = l;
    src.y = h;
    return result;
}

The CUDA registers have a size of 32 bits anyway. In the best case the compiler won't need any extra code. In the worst case it has to reorder the registers by moving a 32-bit value.

Godbolt example https://godbolt.org/z/3r9WYK9e7 of how optimized it gets.

type casting to unsigned long long in CUDA?

3 Answers3