0

I try to compile the code below both to static library and to object file:

Halide::Func f("f");
Halide::Var x("x");

f(x) = x;
f.gpu_tile(x, 4);
f.bound(x, 0, 16);

Halide::Target target = Halide::get_target_from_environment();
target.set_feature(Halide::Target::OpenCL);
target.set_feature(Halide::Target::Debug);
// f.compile_to_static_library("mylib", {}, "f", target);
// f.compile_to_file("mylib", {}, "f", target);

In case of static linking all works fine and output result is correct:

Halide::Buffer<int> output(16);
f(output.raw_buffer());
output.copy_to_host();
std::cout << output(10) << std::endl;

But when I try link object file into shared object,

gcc -shared -pthread mylib.o -o mylib.so

And open it from code (Ubuntu 16.04),

void* handle = dlopen("mylib.so", RTLD_NOW);
int (*func)(halide_buffer_t*);
*(void**)(&func) = dlsym(handle, "f");
func(output.raw_buffer());

I receive CL_INVALID_MEM_OBJECT error. Here is the debugging log:

CL: halide_opencl_init_kernels (user_context: 0x0, state_ptr: 0x7f1266b5a4e0, program: 0x7f1266957480, size: 1577
    load_libopencl (user_context: 0x0)
    Loaded OpenCL runtime library: libOpenCL.so
    create_opencl_context (user_context: 0x0)
    Got platform 'Intel(R) OpenCL', about to create context (t=6249430)
    Multiple CL devices detected. Selecting the one with the most cores.
      Device 0 has 20 cores
      Device 1 has 4 cores
    Selected device 0
      device name: Intel(R) HD Graphics
      device vendor: Intel(R) Corporation
      device profile: FULL_PROFILE
      global mem size: 1630 MB
      max mem alloc size: 815 MB
      local mem size: 65536
      max compute units: 20
      max workgroup size: 256
      max work item dimensions: 3
      max work item sizes: 256x256x256x0
    clCreateContext -> 0x1899af0
    clCreateCommandQueue 0x1a26a80
    clCreateProgramWithSource -> 0x1a26ab0
    clBuildProgram 0x1a26ab0 -D MAX_CONSTANT_BUFFER_SIZE=854799155 -D MAX_CONSTANT_ARGS=8
    Time: 1.015832e+02 ms
CL: halide_opencl_run (user_context: 0x0, entry: kernel_f_s0_x___deprecated_block_id_x___block_id_x, blocks: 4x1x1, threads: 4x1x1, shmem: 0
    clCreateKernel kernel_f_s0_x___deprecated_block_id_x___block_id_x ->     Time: 1.361700e-02 ms
    clSetKernelArg 0 4 [0x2e00010000000000 ...] 0
    clSetKernelArg 1 8 [0x2149040 ...] 1
Mapped dev handle is: 0x2149040
Error: CL: clSetKernelArg failed: CL_INVALID_MEM_OBJECT
Aborted (core dumped)

Thank you very much for help! Commit state c7375fa. I'm pleasure provide extra information if it will be necessary.

Dmitry Kurtaev
  • 823
  • 6
  • 14
  • Update: with help `halide_set_custom_print` found that in case of static linking `clCreateContext` calls once when `halide_opencl_init_kernels` but for dynamic linking `clCreateContext` calls twice: when `halide_opencl_init_kernels` and `halide_opencl_device_malloc`. Created contexts are different. – Dmitry Kurtaev Mar 21 '17 at 07:45
  • I think I found problem. My sample has Halide dependency too. With single file without Halide dependency dynamic linking work. It seems to me like in described case symbol `clCreateContext` is duplicated and require two loads from `libOpenCL.so`. – Dmitry Kurtaev Mar 21 '17 at 08:27

1 Answers1

0

Solution: In this case we have runtime duplication. Load shared object with flag RTLD_DEEPBIND.

void* handle = dlopen("mylib.so", RTLD_NOW | RTLD_DEEPBIND);

RTLD_DEEPBIND (since glibc 2.3.4) Place the lookup scope of the symbols in this library ahead of the global scope. This means that a self-contained library will use its own symbols in preference to global symbols with the same name contained in libraries that have already been loaded. This flag is not specified in POSIX.1-2001. https://linux.die.net/man/3/dlopen

Dmitry Kurtaev
  • 823
  • 6
  • 14