Hi I've new to CUDA programming. I've got this piece of assembly code from building a program with OpenCL.
I came to wonder what those numbers and characters mean. Such as %f7, %f11, %rd3, %r3, %f, %p.
I'm guessing that rd
probably refers to a register? and the number is the register number?, and perhaps the percentage is just a way of writing operands to ptx command(i.e. ld.shared.f32)?
If I'm correct in my guessings then what does %r3 mean is it like a different class of register? and %p and %f7 as well.
Thank you in advance.
ld.global.f32 %f7, [%rd16];
st.shared.f32 [%rd2], %f7;
bar.sync 0;
ld.shared.f32 %f8, [%rd4];
ld.shared.f32 %f9, [%rd3];
fma.rn.f32 %f10, %f9, %f8, %f32;
ld.shared.f32 %f11, [%rd4+32];
ld.shared.f32 %f12, [%rd3+4];
fma.rn.f32 %f13, %f12, %f11, %f10;
ld.shared.f32 %f14, [%rd4+64];
ld.shared.f32 %f15, [%rd3+8];
fma.rn.f32 %f16, %f15, %f14, %f13;
ld.shared.f32 %f17, [%rd4+96];
ld.shared.f32 %f18, [%rd3+12];
fma.rn.f32 %f19, %f18, %f17, %f16;
ld.shared.f32 %f20, [%rd4+128];
ld.shared.f32 %f21, [%rd3+16];
fma.rn.f32 %f22, %f21, %f20, %f19;
ld.shared.f32 %f23, [%rd4+160];
ld.shared.f32 %f24, [%rd3+20];
fma.rn.f32 %f25, %f24, %f23, %f22;
ld.shared.f32 %f26, [%rd4+192];
ld.shared.f32 %f27, [%rd3+24];
fma.rn.f32 %f28, %f27, %f26, %f25;
ld.shared.f32 %f29, [%rd4+224];
ld.shared.f32 %f30, [%rd3+28];
fma.rn.f32 %f32, %f30, %f29, %f28;
bar.sync 0;
add.s32 %r37, %r37, 8;
add.s32 %r36, %r36, %r11;
add.s32 %r38, %r38, 1;
setp.lt.s32 %p5, %r38, %r8;
[Edited]
Million Thanks to Robert Crovella for the Thorough answer! Just in case anyone's might wonder, this is the register declaration part(?) at the top of my ptx file
.reg .pred %p<6>;
.reg .f32 %f<33>;
.reg .b32 %r<39>;
.reg .b64 %rd<19>;
.shared .align 4 .b8 sgemm$blockA[256];
// demoted variable
.shared .align 4 .b8 sgemm$blockB[256];
The shared register size of 256 as I've set it to size 16 * 16.
And the specific section of the reference document is here