I am working on parallel programing with CUDA GPUs. I compiled a CUDA version of a matrix multiplication program with the nvcc compiler. Now I need to look at the intermediate code of the process, so I can understand the parallelization process. How can I get access to this?
-
Have a look at [Generate CUDA PTX file in Visual Studio](http://www.orangeowlsolutions.com/archives/464) for Parallel Thread eXecution (PTX) language and [Obtaining CUDA assembly](http://www.orangeowlsolutions.com/archives/555) for CUDA assembly if you want to generate lower level code and have a better insight on what your GPU is actually doing. – Vitality Oct 24 '13 at 19:46
-
@JackOLantern this is a pretty basic question I think, and your comment seems appropriate as an answer. If you would supply it as an answer I would upvote it. – Robert Crovella Oct 25 '13 at 13:22
-
@RobertCrovella Thanks, Robert. I have extended my comment to an answer. – Vitality Oct 25 '13 at 16:25
1 Answers
Generate CUDA PTX file - Visual Studio instructions
If you need to generate PTX files from your Visual Studio CUDA project, you may act as follows:
- Access the properties panel of your Project.
- Open the CUDA C/C++ configurator.
- Set the "Keep Preprocessed Files" to yes.
- Set a directory of destination in "Keep Directory".
Obtaining CUDA assembly - Visual Studio instructions
PTX is an intermediate language designed to be portable across multiple GPU architectures, but it is not the ultimate machine code executed by the GPU. Indeed, it gets compiled by the compiler component PTXAS into the final machine code, also referred to as SASS, for the particular architecture at hand. The final machine code actually executed by the GPU can be obtained by disassembling it with the cuobjdump utility. To do so, in a Visual Studio Cuda Project go to:
Project -> Properties -> Configuration Properties -> CUDA C/C++ -> Common -> Keep Preprocessed Files -> choose Yes (--keep)
Open a command window, go to the Release folder of your VS project:
\..\Project_Name\Project_Name\Release
and type:
cuobjdump yourkernel.sm_21.cubin --dump -sass
yourkernel.sm_21.cubin
is the file containing a fat binary which may contain one or more device-specific binary images (in this case, specific to sm_21
) as well as (optionally) PTX.
In the command window, you will obtain something like
Function : _Z11simple_copyPfPKf
.headerflags @"EF_CUDA_SM20 EF_CUDA_PTX_SM(EF_CUDA_SM20)"
/*0000*/ MOV R1, c[0x1][0x100]; /* 0x2800440400005de4 */
/*0008*/ NOP; /* 0x4000000000001de4 */
/*0010*/ MOV R0, c[0x0][0x14]; /* 0x2800400050001de4 */
/*0018*/ S2R R2, SR_CTAID.Y; /* 0x2c00000098009c04 */
/*0020*/ SHL R0, R0, 0x5; /* 0x6000c00014001c03 */
/*0028*/ S2R R3, SR_TID.Y; /* 0x2c0000008800dc04 */
/*0030*/ ISCADD R3, R2, R3, 0x5; /* 0x400000000c20dca3 */
/*0038*/ S2R R4, SR_CTAID.X; /* 0x2c00000094011c04 */
/*0040*/ S2R R5, SR_TID.X; /* 0x2c00000084015c04 */
/*0048*/ ISCADD R2, R4, R5, 0x5; /* 0x4000000014409ca3 */
/*0050*/ IMAD R2, R0, R3, R2; /* 0x200400000c009ca3 */
/*0058*/ ISCADD R0, R2, c[0x0][0x24], 0x2; /* 0x4000400090201c43 */
/*0060*/ ISCADD R2, R2, c[0x0][0x20], 0x2; /* 0x4000400080209c43 */
/*0068*/ LD R0, [R0]; /* 0x8000000000001c85 */
/*0070*/ ST [R2], R0; /* 0x9000000000201c85 */
/*0078*/ EXIT ; /* 0x8000000000001de7 */
.....................................

- 20,705
- 4
- 108
- 146