12

I'm new to C, C++ and OpenCL and doing my best to learn them at the moment. Here's a preexisting C++ function that I'm trying to figure out how to port to OpenCL using either the C or C++ bindings.

#include <vector>

using namespace std;

class Test {

private:

    double a;
    vector<double> b;
    vector<long> c;
    vector<vector<double> > d;

public:

    double foo(long x, double y) {
        // mathematical operations
        // using x, y, a, b, c, d
        // and also b.size()
        // to calculate return value
        return 0.0;
    }

};

Broadly my question is how to pass in all the class members that this function accesses into the binding and the kernel. I understand how to pass in the scalar values but the vector values I'm not sure about. Is there perhaps a way to pass in pointers to each of the above members or memory map them so that OpenCL's view of them is in sync with host memory? Broken down my questions are as below.

  1. How do I pass in member b and c to the binding and the kernel given that these are of variable size?
  2. How do I pass in member d given that it is two dimensional?
  3. How do I access these members from within the kernel and what types will they be declared as in the arguments to the kernel? Will simply using array index notation i.e. b[0] work for access?
  4. How would I invoke an operation equivalent to b.size() within the kernel function or would I not and instead pass in the size from the binding into the kernel as an extra argument? What happens if it changes?

I would really appreciate either C or C++ binding and kernel code example source code in answers.

Many thanks.

junkie
  • 809
  • 2
  • 8
  • 19
  • 11
    `using namespace std;` - Don't do that in a header, ever. – Ed S. Sep 14 '12 at 18:16
  • @EdS. why would that be? – dominicbri7 Apr 12 '15 at 19:56
  • 5
    @dominicbri7: Because you are polluting the global namespace for everyone who includes your header. Maybe I don't want `std` imported into my global namespace. Maybe there's a good reason for that. You wen't and made the choice for me. – Ed S. Apr 12 '15 at 22:14

1 Answers1

14
  1. You have to allocate an OpenCL buffer and copy your CPU data into it. An OpenCL buffer has a fixed size, so you either have to recreate it if your data size changes or you make it "big enough" and use only a subsection of it if less memory is needed. For example, to create a buffer for b and at the same time copy all of its data to the device:

    cl_mem buffer_b = clCreateBuffer(
        context, // OpenCL context
        CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, // Only read access from kernel,
                                                 // copy data from host
        sizeof(cl_double) * b.size(), // Buffer size in bytes
        &b[0], // Pointer to data to copy
        &errorcode); // Return code
    

    It is also possible to directly map host memory (CL_MEM_USE_HOST_PTR), but this imposes some restrictions on the alignment and the access to the host memory after creating the buffer. Basically, the host memory can contain garbage when you are not currently mapping it.

  2. It depends. Are the sizes of the vectors in the second dimension consistenly equal? Then just flatten them when uploading them to the OpenCL device. Otherwise it gets more complicated.

  3. You declare buffer arguments as __global pointers in your kernel. For example, __global double *b would be appropiate for the buffer created in 1. You can simply use array notation in the kernel to access the individual elements in the buffer.

  4. You cannot query the buffer size from within the kernel, so you have to pass it manually. This can also happen implicitly, e.g. if the number of work items matches the size of b.

A kernel which can access all of the data for the computation could look like this:

__kernel void foo(long x, double y, double a, __global double* b, int b_size,
                  __global long* c, __global double* d,
                  __global double* result) {
  // Here be dragons
  *result = 0.0;
}

Note that you also have to allocate memory for the result. It might be necessary to pass additional size arguments should you need them. You would call the kernel as follows:

// Create/fill buffers
// ...

// Set arguments
clSetKernelArg(kernel, 0, sizeof(cl_long), &x);
clSetKernelArg(kernel, 1, sizeof(cl_double), &y);
clSetKernelArg(kernel, 2, sizeof(cl_double), &a);
clSetKernelArg(kernel, 3, sizeof(cl_mem), &b_buffer);
cl_int b_size = b.size();
clSetKernelArg(kernel, 4, sizeof(cl_int), &b_size);
clSetKernelArg(kernel, 5, sizeof(cl_mem), &c_buffer);
clSetKernelArg(kernel, 6, sizeof(cl_mem), &d_buffer);
clSetKernelArg(kernel, 7, sizeof(cl_mem), &result_buffer);
// Enqueue kernel
clEnqueueNDRangeKernel(queue, kernel, /* ... depends on your domain */);

// Read back result
cl_double result;
clEnqueueReadBuffer(queue, result_buffer, CL_TRUE, 0, sizeof(cl_double), &result,
                    0, NULL, NULL);
reima
  • 2,076
  • 15
  • 22
  • Thank you very much reima. That helps a lot. Two questions: (1) my original data is all in C++ types as you know. But all memory allocation in your code above is in cl_types. I understand why. But, in a test program, where I pass two long vectors into a kernel which adds the value at a[0] to b[1], this only works if all types are cl_types in the program including original vector declarations which seems strange as the original data must be C++ types. What am I missing here? (2) How do I use the 'result' as a C++ type above? – junkie Sep 15 '12 at 15:41
  • 2
    You are right, I was a bit sloppy with the types there. My example code will only work if `cl_long` is the same type as `long`. If they are not the same, you will probably have to do a conversion step before uploading the data to the device. `cl_long` and `cl_double` are C++ types like any other, they are just typedefs. You can use `result` directly, as it is probably already a `double`. – reima Sep 15 '12 at 15:47
  • Thx. I can confirm that in my test program, using pointers to vector data doesn't work. So maybe conversion needed (create second set of vectors/arrays and copy into it?).And setting result into a long variable gives a warning in VS saying "possible loss of data". cl_long in cl_platform.h is set to `typedef signed __int64 cl_long;` whereas long is 4 bytes. So maybe no way to use as long without possible loss of data? – junkie Sep 15 '12 at 16:12