I'm trying to optimize a piece of code for A100 GPUs (ampere gen), right now we use uint64_t but I am seeing uint2 datatypes being used instead in some cuda code. Does the uint2 offer advantages for register usage? I know there are a limited number of 64-bit registers, does uint2 split the x,y ints across 32-bit registers for better occupancy? I couldn't find any specific information about register storage with these datatypes so any links to documentation for it would be appreciated.
Asked
Active
Viewed 150 times
0
-
Without a concrete example, I perceive this as speculative question about implementation artifacts (so not documented, not guaranteed, subject to change at any time). By observation: GPU registers comprise 32 bits each. Any 64-bit data types therefore must occupy two registers. Where 64-bit operands are consumed or produced by machine instructions, they occupy a register pair (two *consecutive* registers, with the least significant 32 bits stored in an even-numbered register, e.g. R0, R2, etc). In all other cases the compiler is free to store a 64-bit operand in any two registers. – njuffa Mar 07 '22 at 21:59
-
1No, there are not performance/storage differences between `uint2` and `uint64_t` in CUDA. – Robert Crovella Mar 07 '22 at 23:09
1 Answers
1
Does the uint2 offer advantages for register usage?
No.
I know there are a limited number of 64-bit registers
Indeed. Extremely limited, i.e. zero. There are no 64 bit registers in any CUDA compatible GPU I am aware of. When the compiler encounters a 64-bit type, it composites it from two adjacent 32-bit registers.
does uint2 split the x,y ints across 32-bit registers for better occupancy?
No. All the CUDA built-in vector types exist for memory bandwidth optimization (there are vector load/store instructions in PTX) and for compatibility with the texture/surface hardware which can do filtering on some of those types, which can be better for performance.

talonmies
- 70,661
- 34
- 192
- 269