I'm trying to implement offline compilation of my OpenCL compute shaders for a library I'm researching in preparation for its development using SPIRV. The best way to do this seems to be using clang and LLVM, and I've been following the guide from Khronos Blog:
https://www.khronos.org/blog/whats-new-in-clang-release-14-for-opencl-developers
Using the following commands (I've also tried it with spir64 target, and CL2.0):
> clang -cl-std=CL1.2 -c -target spir -O0 -emit-llvm -o test_cl_spv.bc test_cl_spv.cl
> llvm-spirv test_cl_spv.bc -o test_cl_spv.spv
> spirv-opt test_cl_spv.spv -o test_cl_spv.spv
> spirv-link test_cl_spv.spv -o test_cl_spv.spv
I've downloaded the relevant repositories (clang/llvm v17, SPIRV-Tools, LLVM-SPIRV translator) on my Ubuntu AWS instance, and built the projects (with all the various plugins/projects for LLVM specified in the build options).
The problem I'm having is that even though I can build a SPIRV file using clang and LLVM-SPIRV Translator, which is successfully loaded as a program using clCreateProgramWithIL()
, it returns CL_INVALID_KERNEL_NAME (-46) when trying to get the kernel using clCreateKernel()
on my Intel Iris Xe graphics.
I can get clang to compile this simple compute kernel into a LLVM binary, and can use LLVM-SPRIV Translator to output a spirv file without error:
OpenCL source code:
__kernel void work(__global int * ptr) {
size_t id = get_global_id(0);
ptr[id] = (int)id;
}
Here is the readable version of the LLVM before it's translated to a SPIRV file:
; ModuleID = 'test_cl_spv.cl'
source_filename = "test_cl_spv.cl"
target datalayout = "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024"
target triple = "spir"
; Function Attrs: convergent noinline norecurse nounwind optnone
define dso_local spir_kernel void @work(ptr addrspace(1) noundef align 4 %0) #0 !kernel_arg_addr_space !4 !kernel_arg_access_qual !5 !kernel_arg_type !6 !kernel_arg_base_type !6 !kernel_arg_type_qual !7 {
%2 = alloca ptr addrspace(1), align 4
%3 = alloca i32, align 4
store ptr addrspace(1) %0, ptr %2, align 4
%4 = call spir_func i32 @_Z13get_global_idj(i32 noundef 0) #2
store i32 %4, ptr %3, align 4
%5 = load ptr addrspace(1), ptr %2, align 4
%6 = load i32, ptr %3, align 4
%7 = getelementptr inbounds i32, ptr addrspace(1) %5, i32 %6
store i32 7, ptr addrspace(1) %7, align 4
ret void
}
; Function Attrs: convergent nounwind willreturn memory(none)
declare dso_local spir_func i32 @_Z13get_global_idj(i32 noundef) #1
attributes #0 = { convergent noinline norecurse nounwind optnone "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "uniform-work-group-size"="true" }
attributes #1 = { convergent nounwind willreturn memory(none) "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
attributes #2 = { convergent nounwind willreturn memory(none) }
!llvm.module.flags = !{!0, !1}
!opencl.ocl.version = !{!2}
!opencl.spir.version = !{!2}
!llvm.ident = !{!3}
!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 7, !"frame-pointer", i32 2}
!2 = !{i32 1, i32 2}
!3 = !{!"clang version 17.0.0 (https://github.com/llvm/llvm-project.git be83a4b257c8f0dfd74a659261a544483c5df9af)"}
!4 = !{i32 1}
!5 = !{!"none"}
!6 = !{!"int*"}
!7 = !{!""}
While it is not identical, it seems similar to the output on Godbolt. When I load the source file directly with clCreateProgramWithSource()
, it is able to find the kernel and execute the compute shader successfully.
I've made a small example that demonstrates this problem:
void compute_test::Run(Platform pltform, Device device)
{
std::string file_path = "C:/Users/jdrurka1/source/repos/Dynamics-io/Dynamics.io-Testbench/CPP_Bench/shaders/OpenCL/test_cl_spv.spv";
cl_platform_id platform_id = (cl_platform_id)pltform.platform;
// context properties list - must be terminated with 0
cl_context_properties properties[3];
properties[0] = CL_CONTEXT_PLATFORM;
properties[1] = (cl_context_properties)platform_id;
properties[2] = 0;
cl_device_id deviceID = (cl_device_id)device.cl_device;
cl_int err;
cl_context context = clCreateContext(properties, 1, &deviceID, NULL, NULL, &err);
cl_command_queue command_queue = clCreateCommandQueue(context, deviceID, 0, &err);
//open file
std::ifstream infile(file_path, std::ios::binary);
char* buffer;
//get length of file
infile.seekg(0, infile.end);
size_t length = infile.tellg();
infile.seekg(0, infile.beg);
if (length == 0)
return;
//read file
printf("Reading binary file of length %i\n", (int)length);
buffer = new char[length];
infile.read(buffer, length);
cl_program program = clCreateProgramWithIL(context, buffer, length, &err);
printf("CreateProgramWithIL: res %i\n", err);
std::string args;
cl_int build_res = clBuildProgram(program, 1, &deviceID, args.c_str(), NULL, NULL);
printf("BuildProgram: res %i\n", build_res);
cl_kernel kernel = clCreateKernel(program, "work", &err);
printf("Create kernel '%s': %i\n", "work", err);
char errorStr[1000];
size_t e_size = 0;
int res = clGetProgramBuildInfo(program, deviceID, CL_PROGRAM_BUILD_LOG, 1000, errorStr, &e_size);
printf("LOG (%i): %s\n", e_size, errorStr);
}
(I've also tried with clCreateProgramWithBinary()).
Which produces this console output (the program prints the platforms/devices and selects a device before calling Run(). In this case it is choosing the Intel device):
Platform NVIDIA CUDA:
NVIDIA Corporation - NVIDIA T500
Platform Intel(R) OpenCL HD Graphics:
Intel(R) Corporation - Intel(R) Iris(R) Xe Graphics
Reading binary file of length 900
CreateProgramWithIL: res 0
BuildProgram: res 0
Create kernel 'work': -46 (CL_INVALID_KERNEL_NAME)
LOG (1):
I'm not sure what I am missing here to get OpenCL to load the kernel. When I look at the extensions, the spirv extension is present. I've tried running spirv-opt and spirv-link on the files, with the same result.
Any guidance is appreciated.