I'm getting an obscure exception when loading the PTX assembly generated by LLVM's NVPTX backend. (I'm loading the PTX from ManagedCuda - http://managedcuda.codeplex.com/ )
ErrorNoBinaryForGPU: This indicates that there is no kernel image available that is suitable for the device. This can occur when a user specifies code generation options for a particular CUDA source file that do not include the corresponding device configuration.
Here is the LLVM IR for the module (it's a bit weird since it's generated by a tool)
; ModuleID = 'Module'
target triple = "nvptx64-nvidia-cuda"
%testStruct = type { i32 }
define void @kernel(i32 addrspace(1)*) {
entry:
%1 = alloca %testStruct
store %testStruct zeroinitializer, %testStruct* %1
%2 = load %testStruct* %1
call void @structtest(%testStruct %2)
ret void
}
define void @structtest(%testStruct) {
entry:
ret void
}
!nvvm.annotations = !{!0}
!0 = metadata !{void (i32 addrspace(1)*)* @kernel, metadata !"kernel", i32 1}
and here is the resulting PTX
//
// Generated by LLVM NVPTX Back-End
//
.version 3.1
.target sm_20
.address_size 64
// .globl kernel
.visible .func structtest
(
.param .b0 structtest_param_0
)
;
.visible .entry kernel(
.param .u64 kernel_param_0
)
{
.local .align 8 .b8 __local_depot0[8];
.reg .b64 %SP;
.reg .b64 %SPL;
.reg .s32 %r<2>;
.reg .s64 %rl<2>;
mov.u64 %rl1, __local_depot0;
cvta.local.u64 %SP, %rl1;
mov.u32 %r1, 0;
st.u32 [%SP+0], %r1;
// Callseq Start 0
{
.reg .b32 temp_param_reg;
// <end>}
.param .align 4 .b8 param0[4];
st.param.b32 [param0+0], %r1;
call.uni
structtest,
(
param0
);
//{
}// Callseq End 0
ret;
}
// .globl structtest
.visible .func structtest(
.param .b0 structtest_param_0
)
{
ret;
}
I have no idea how to read PTX, but I have a feeling the problem has to do with the .b0
bit of .param .b0 structtest_param_0
in the structtest function definition.
Passing non-structure values (like integers or pointers) works fine, and the .b0
. bit of the function reads something sane like .b32
or .b64
when doing so.
Changing triple to nvptx-nvidia-cuda (32 bit) does nothing, as well as including/excluding the data layout suggested in http://llvm.org/docs/NVPTXUsage.html
Is this a bug in the NVPTX backend, or am I doing something wrong?
Update:
I'm looking through this - http://llvm.org/docs/doxygen/html/NVPTXAsmPrinter_8cpp_source.html - and it appears as if the type is falling through to line 01568
, is obviously not a primitive type, and Ty->getPrimitiveSizeInBits()
returns zero. (At least that's my guess, anyway)
Do I need to add a special case for checking to see if it's a structure, taking the address, making the argument byval
, and dereferencing the struct afterwards? That seems like a hacky solution, but I'm not sure how else to fix it.