I'm following the instructions on this SO answer but when I try to run the resulting PTX file I get the follow error in clBuild
ptxas fatal : Unresolved extern function 'get_group_id'
In the PTX file I have the following for every OpenCL function call I use
.func (.param .b64 func_retval0) get_group_id
(
.param .b32 get_group_id_param_0
)
;
The above isn't present in the PTX files created by the OpenCL runtime when I provide it with a CL file. Instead it has the proper special register.
Following these instructions (links against a different libclc library) gives me a segmentation fault during the LLVM IR to PTX compilation with the following error:
fatal error: error in backend: Cannot cast between two non-generic address spaces
Are those instructions still valid? Is there something else I should be doing?
I'm using the latest version of libclc, Clang 3.7, and Nvidia driver 352.39