The way to do this is to
print *(void * @parameter *) addr
where addr is the address inside the constant bank 0 that should be printed.
Example
Suppose we have a simple kernel in a file called foo.cu
:
#include <cuda.h>
#include <stdio.h>
#include <cuda_runtime.h>
__global__ void myKernel(int a, int b, int *d)
{
*d = a + b;
}
int main(int argc, char *argv[]) {
if (argc < 3) {
printf("Requires inputs a and b to be specified\n");
return 0;
}
int * dev_d;
int d;
cudaMalloc(&dev_d, sizeof(*dev_d));
myKernel<<<1, 1>>>(atoi(argv[1]), atoi(argv[2]), dev_d);
cudaMemcpy(&d, dev_d, sizeof(d), cudaMemcpyDeviceToHost);
cudaFree(dev_d);
printf("D is: %d\n", d);
return 0;
}
which is compiled via
$ nvcc foo.cu -o foo.out
Next, suppose we are interested in disassembling this program, so we execute cuda-gdb
with a command-line for our program:
$ cuda-gdb --args ./foo.out 10 15
Inside cuda-gdb
, we get to the kernel by typing
(cuda-gdb) set cuda break_on_launch application
(cuda-gdb) start
Temporary breakpoint 1, 0x000055555555b12a in main ()
(cuda-gdb) cont
Inside the kernel, we view the disassembly we are interested in debugging:
(cuda-gdb) x/15i $pc
=> 0x555555b790a8 <_Z8myKerneliiPi+8>: MOV R1, c[0x0][0x20]
0x555555b790b0 <_Z8myKerneliiPi+16>: MOV R0, c[0x0][0x144]
0x555555b790b8 <_Z8myKerneliiPi+24>: MOV R2, c[0x0][0x148]
0x555555b790c0 <_Z8myKerneliiPi+32>:
0x555555b790c8 <_Z8myKerneliiPi+40>: MOV R3, c[0x0][0x14c]
0x555555b790d0 <_Z8myKerneliiPi+48>: IADD R0, R0, c[0x0][0x140]
0x555555b790d8 <_Z8myKerneliiPi+56>: STG.E [R2], R0
0x555555b790e0 <_Z8myKerneliiPi+64>:
0x555555b790e8 <_Z8myKerneliiPi+72>: NOP
0x555555b790f0 <_Z8myKerneliiPi+80>: NOP
0x555555b790f8 <_Z8myKerneliiPi+88>: NOP
0x555555b79100 <_Z8myKerneliiPi+96>:
0x555555b79108 <_Z8myKerneliiPi+104>: EXIT
0x555555b79110 <_Z8myKerneliiPi+112>: BRA 0x70
0x555555b79118 <_Z8myKerneliiPi+120>: NOP
The second argument being passed to the IADD
instruction is in one of the constant memory banks. Let's find out what its value actually is. We advance go to the IADD
instruction:
(cuda-gdb) stepi 4
0x0000555555b790d0 in myKernel(int, int, int*)<<<(1,1,1),(1,1,1)>>> ()
(cuda-gdb) x/i $pc
=> 0x555555b790d0 <_Z8myKerneliiPi+48>: IADD R0, R0, c[0x0][0x140]
We can now obtain the contents of c[0x0][0x140]
as follows:
(cuda-gdb) print (int) *(void * @parameter *) 0x140
$1 = 10
Here, we knew the argument should have 32 bits, so we cast it as an (32-bit) int
. If we hadn't done this, we would get too many bits, e.g.:
(cuda-gdb) print *(void * @parameter *) 0x140
$2 = 0xf0000000a
Note the hexadecimal format can be retained by adding /x after the print
command:
(cuda-gdb) print/x (int) *(void * @parameter *)0x140
$3 = 0xa