1

I'm getting an obscure exception when loading the PTX assembly generated by LLVM's NVPTX backend. (I'm loading the PTX from ManagedCuda - http://managedcuda.codeplex.com/ )

ErrorNoBinaryForGPU: This indicates that there is no kernel image available that is suitable for the device. This can occur when a user specifies code generation options for a particular CUDA source file that do not include the corresponding device configuration.

Here is the LLVM IR for the module (it's a bit weird since it's generated by a tool)

; ModuleID = 'Module'
target triple = "nvptx64-nvidia-cuda"

%testStruct = type { i32 }

define void @kernel(i32 addrspace(1)*) {
entry:
  %1 = alloca %testStruct
  store %testStruct zeroinitializer, %testStruct* %1
  %2 = load %testStruct* %1
  call void @structtest(%testStruct %2)
  ret void
}

define void @structtest(%testStruct) {
entry:
  ret void
}

!nvvm.annotations = !{!0}

!0 = metadata !{void (i32 addrspace(1)*)* @kernel, metadata !"kernel", i32 1}

and here is the resulting PTX

//
// Generated by LLVM NVPTX Back-End
//

.version 3.1
.target sm_20
.address_size 64

        // .globl       kernel
.visible .func structtest
(
        .param .b0 structtest_param_0
)
;

.visible .entry kernel(
        .param .u64 kernel_param_0
)
{
        .local .align 8 .b8     __local_depot0[8];
        .reg .b64       %SP;
        .reg .b64       %SPL;
        .reg .s32       %r<2>;
        .reg .s64       %rl<2>;

        mov.u64         %rl1, __local_depot0;
        cvta.local.u64  %SP, %rl1;
        mov.u32         %r1, 0;
        st.u32  [%SP+0], %r1;
        // Callseq Start 0
        {
        .reg .b32 temp_param_reg;
        // <end>}
        .param .align 4 .b8 param0[4];
        st.param.b32    [param0+0], %r1;
        call.uni
        structtest,
        (
        param0
        );

        //{
        }// Callseq End 0
        ret;
}

        // .globl       structtest
.visible .func structtest(
        .param .b0 structtest_param_0
)
{


        ret;
}

I have no idea how to read PTX, but I have a feeling the problem has to do with the .b0 bit of .param .b0 structtest_param_0 in the structtest function definition.

Passing non-structure values (like integers or pointers) works fine, and the .b0. bit of the function reads something sane like .b32 or .b64 when doing so.

Changing triple to nvptx-nvidia-cuda (32 bit) does nothing, as well as including/excluding the data layout suggested in http://llvm.org/docs/NVPTXUsage.html

Is this a bug in the NVPTX backend, or am I doing something wrong?


Update:

I'm looking through this - http://llvm.org/docs/doxygen/html/NVPTXAsmPrinter_8cpp_source.html - and it appears as if the type is falling through to line 01568, is obviously not a primitive type, and Ty->getPrimitiveSizeInBits() returns zero. (At least that's my guess, anyway)

Do I need to add a special case for checking to see if it's a structure, taking the address, making the argument byval, and dereferencing the struct afterwards? That seems like a hacky solution, but I'm not sure how else to fix it.

1 Answers1

0

Have you tried to get the error message buffer from compilation? In managedCuda this would be something like:

CudaContext ctx = new CudaContext();
CudaJitOptionCollection options = new CudaJitOptionCollection();
CudaJOErrorLogBuffer err = new CudaJOErrorLogBuffer(1024);
options.Add(err);
try
{
    ctx.LoadModulePTX("test.ptx", options);
}
catch 
{
    options.UpdateValues();
    MessageBox.Show(err.Value);                
}

When I run your ptx it says:

ptxas application ptx input, line 12; fatal : Parsing error near '.b0': syntax error

ptxas fatal : Ptx assembly aborted due to errors"

what supports your guess with b0.

kunzmi
  • 1,024
  • 1
  • 6
  • 8
  • Thanks for the info! I didn't know that existed, it'll definitely help in the future. However, it doesn't really do anything the problem I'm having. –  Jan 21 '14 at 21:43