Does it make sense to use an unsigned short integer for registers (for saving register's memory) and shared memory (faster access) in CUDA programs?
I create template __device__
function (using registers and shared memory) and specialize it for uint
and ushort
.
Use:
- For
uint
: 25 registers and speed 460 MB/sec. - For
ushort
: 26 registers and speed 420 MB/sec.
So there is no reason to use unsigned short int
.