12

I've written an OpenCL kernel in a .cl file. It attempts to #include several headers.

Its compilation fails, since the included header files are "not found". I am aware that clBuildProgram can take the -I dir option, which adds the directory dir to the list of directories to be searched for the header files.

In the khronus site forum this post http://www.khronos.org/message_boards/viewtopic.php?f=37&t=2535 talks about the issue.

They propose to use clCreateProgramWithSource which specifies all sources (including .h files).

I have a questions regarding this issue:

  1. Which option is better? (clBuildProgram vs. clCreateProgramWithSource, as described above)
  2. If I use clCreateProgramWithSource how does the compiler know what to include? I mean, which source stands for which included file name?
  3. If I use clBuildProgram and there are several directories with include files, how do I specify them?
phs
  • 10,687
  • 4
  • 58
  • 84
YAKOVM
  • 9,805
  • 31
  • 116
  • 217

3 Answers3

10

OpenCL requires you use clCreateProgramWithSource() followed by clBuildProgram().

ClCreateProgramWithSource() creates and returns a cl_program object.

That cl_program object is input into clBuildProgram().

clBuildProgram() allows you to specify compiler options which include the include file directories. In your case, for header file includes, it will be something like the string:

-I myincludedir1 -I myincludedir2  ...

The compiler used is the internal OpenCL compiler in the OpenCL SDK you are using. So if you are using AMD's SDK, the AMD OpenCL compiler that is part of their OpenCL SDK will be used. Likewise for Nvidia or Intel.

Its important to check the OpenCL status code for ALL OpenCL function calls. This is mandatory for clCreateProgramWithSource() and clBuildProrgam() to get any compiler errors or messages. There is a whole other bit code to write to get the size of the messages and then retrieve the messages themselves.

Regexident
  • 29,441
  • 10
  • 93
  • 100
Tim Child
  • 2,994
  • 1
  • 26
  • 25
4

The Nvidia OpenCL device drivers have a bug when using -I with a certain number of includes and code length. AMD and Intel don't have this problem. My solutions is to instead concatenate all the .cl files into one large one at runtime. The disadvantage of this is that in debugging code the line number of the error corresponds to the concatentated .cl file and not in the individual .cl files.

I doubt Nvidia will ever fix this. They don't care about OpenCL much anymore.

  • 1
    AMD APP has problems with -I too from my experience (it simply doesn't work), though Intel handles it perfectly. – Thomas Mar 09 '13 at 22:29
  • That's interesting. I though I tested it on the CPU. I don't have a AMD GPU so I can't test it on the GPU. Maybe it's a GPU vs. CPU issue? –  Mar 10 '13 at 07:18
  • 1
    No, I tried it under both devices under Windows, the compiler simply doesn't seem to handle relative include paths. Basically, I have "-I cl/" in my compiler command line, and my kernels are arranged neatly in the cl/ directory, and while this works fine under Intel/Linux, AMD just won't have any of it, no matter what I try, and the only solution I found was either hardcoding the *absolute path* of each .cl file in the #include directives, or add my cl/ folder to the system $PATH. It could be my installation that's broken though, I don't maintain my Windows system as much as my Linux one. – Thomas Mar 10 '13 at 07:23
  • Well in this case not working is better than partially/randomly working. I wasted a lot of time trying to figure out why my code was not changing. It would keep using old code and not recognize my chances. In the end I think it's just best to avoid the -I option. Just add all the .cl files together as one large one at runtime. Another, suggestion: I always run on the CPU first before testing on the GPU. GPU bugs crash my whole system sometimes (at least on Windows). That never happens on the CPU. –  Mar 10 '13 at 20:51
  • Quite true, preprocessing it yourself would be the safest option (another idea is to invoke the system preprocessor yourself, hopefully one is available). I also do all my development on CPU's and then iron out any bugs on the GPU, the kernel simply must not fail in a GPU setting because you won't just get garbage results and/or a segfault, you'll actually kill the driver. I tripped over that a few times when implementing infinite-loop type algorithms on the GPU, if even a single thread gets stuck, you're screwed. – Thomas Mar 10 '13 at 21:04
  • 2
    nVidia doesn't allow paths with spaces. Try using an old DOS name path and it will work! – l33t Sep 12 '13 at 10:34
  • Does anyone have any citations/references to this bug? I'm trying to get the Nvidia compiler to find my cl includes during compliation. (AMD and intel have no problems) – Soylent Graham Dec 23 '13 at 18:05
1

There is one more dirty trick: you should emulate include yourself (i. e. something like manual amalgamation). It is not very clear for coding, but it works if your OpenCL compiler doesn't support (or supports incorrectly) -I directives. This approach is not perfect (for example, you lose syntax highlighting), but can help for old or buggy OpenCL compilers.

Small simple example of this possibility:

std::string load_file(const std::string &file_name, int max_size = 0x100000)
{
    FILE *fp = fopen(file_name.c_str(), "rb");
    if (!fp)
    {
        // print some error or throw exception here
        return std::string();
    }
    char *source = new char[max_size];
    size_t source_size = fread(source, 1, max_size, fp);
    fclose(fp);
    if (!source_size)
    {
        delete[] source;
        // print some error or throw exception here
        return std::string();
    }
    std::string result(source);
    delete[] source;
    return result;
}

// errors checks are omitted for simplification
std::string full_source = load_file("header.h");
full_source += load_file("source.cl");

const char *source_ptr = full_source.c_str();
size_t source_size = full_source.size();
cl_int_status = CL_SUCCESS;
cl_program program  = clCreateProgramWithSource(context, 1,
        (const char **)&source_ptr, (const size_t *)&source_size, &ret);
// check status for CL_SUCCESS here
// now you have your program (include + source)
avtomaton
  • 4,725
  • 1
  • 38
  • 42