1

This is my OpenCl kernel function

private static String programSource =


        "__kernel void "+
        "sampleKernel(__local float *a,"+
        "             __local float *b,"+
        "             __global float *c,"+
        "             __global float *d)"+

        "{"+
        "   int gid=get_local_id(0);"+
        "   c[gid]=a[gid]+b[gid];"+
        "   d[gid]=a[gid]-1;"+
        "}";

    clSetKernelArg(kernel, 0, Sizeof.cl_mem, Pointer.to(memObjects[0]));
    clSetKernelArg(kernel, 1, Sizeof.cl_mem, Pointer.to(memObjects[1]));
    clSetKernelArg(kernel, 2, Sizeof.cl_mem, Pointer.to(memObjects[2]));
    clSetKernelArg(kernel, 3, Sizeof.cl_mem, Pointer.to(memObjects[3]));

I have used global memory to store the data. Now I tried to convert the storage location to local memory. So my code looks like this:

private static String programSource =
            "__kernel void "+
            "sampleKernel(__local float *a,"+
            "             __local float *b,"+
            "             __global float *c,"+
            "             __global float *d)"+
            "{"+
            "   int gid=get_local_id(0);"+
            "   c[gid]=a[gid]+b[gid];"+
            "   d[gid]=a[gid]-1;"+
            "}";
        clSetKernelArg(kernel, 0, Sizeof.cl_mem, NULL);
        clSetKernelArg(kernel, 1, Sizeof.cl_mem, NULL);
        clSetKernelArg(kernel, 2, Sizeof.cl_mem, Pointer.to(memObjects[2]));
        clSetKernelArg(kernel, 3, Sizeof.cl_mem, Pointer.to(memObjects[3]));

When I execute the above code I get the following syntax error:

NULL cannot be resolved to a variable. 

Can anyone identify my mistake?

Thanks in advance!

iH8
  • 27,722
  • 4
  • 67
  • 76

2 Answers2

0

Looking at other JOCL examples that use local memory, you are doing the correct thing, but the final argument should be null instead of NULL (the former is that Java keyword, the latter is typically used in C/C++).

That said, the way in which you are actually using local memory inside your kernel is not quite correct (though I appreciate that this may have just been an example kernel to get the host side working). Moving data to/from local memory must be explicitly managed inside the kernel - the host cannot initialise local memory (in your example the local memory buffers will contain garbage values). Your input values need to be passed to the kernel in global memory buffers.

Currently, your calls to clSetKernelArg are only allocating 4 or 8 bytes (Sizeof.cl_mem) for each local memory buffer, which probably isn't what you wanted. This is fine for global memory arguments, since you are only storing the pointer - the actual buffer allocation is performed when you call clCreateBuffer. For local memory arguments, this size is the amount of memory you wish to allocate for the buffer, and so needs the reflect the amount of data you wish to store in local memory (for each work-group).

jprice
  • 9,755
  • 1
  • 28
  • 32
  • Yes, null worked. I used lowercase. Based on you comment, I understood that host transfers a,b to global memory. Then using a kernel I must transfer data from global to local memory. Is that right? If so, what would be the correct way to store the arrays a & b in local memory? What changes should I make in clSetKernelArg and kernel functions? Regarding memory allocation in clSetKernelArg, it works well for global arrays c and d. I allocate only 4 bytes. Why doesn't that suffice for arrays a, b too? – Sadhana Rayan Feb 09 '15 at 06:13
  • @SadhanaRayan I've added some notes about why the size parameter in `clSetKernelArg` is different for local memory than for global. For your example kernel, your `a` and `b` arrays need to be in global memory, otherwise they can't be initialised by the host. You can then copy them to local memory manually (e.g. `local_a[i] = global_a[i]`) if you wish. This will just make the kernel slower though - your example kernel really has no use case for local memory. – jprice Feb 09 '15 at 08:31
  • I have bigger code to run in GPU. For now I am simply practicing to access different memories. I read an article from this link: http://www.openclblog.com/2011/03/is-your-local-memory-really-local.html As he said I got CL_GLOBAL. So does this mean there will be no improvement even if I store the arrays in local memory? – Sadhana Rayan Feb 09 '15 at 11:25
  • Yes, it's unlikely you'll see any improvement if you're device doesn't have a dedicated physical local memory. Local memory may still be useful for sharing data within a workgroup though, for example when implementing a parallel reduction. – jprice Feb 09 '15 at 11:32
  • I got confused a little about the GPU architecture. It looks like all cards don't have a dedicated physical local memories. So in my case, which physical memory will my work group access for local storage? – Sadhana Rayan Feb 10 '15 at 02:32
  • Most modern discrete GPU architectures should have dedicated local memory storage. If they don't, any memory declared as `local` will map to the same physical memory as `global`, which will typically be some large off-chip storage. – jprice Feb 10 '15 at 09:48
0

Give it a try with

clSetKernelArg(kernel, 0, Sizeof.cl_mem, new Pointer());
clSetKernelArg(kernel, 1, Sizeof.cl_mem, new Pointer());

This should create a valid NULL-pointer.

Christian
  • 395
  • 2
  • 13
  • I tried using 'new Pointer()' and also 'null'. Both worked. Now the syntax error is gone. But still I don't get the expected results. c[] contains 0.0 and d[] has -1.0. As jprice said I guess the local memory has to be initialized from inside the kernel. What are the changes that should be done in the code in order to store initial arrays a & b in local memory? – Sadhana Rayan Feb 09 '15 at 05:37
  • What are your expected results? `a` and `b` are uninitialized float values, so "randomly" they are set to `0.0`. – Christian Feb 11 '15 at 11:37
  • I found the error. I misplaced input and output arrays. a,b should be global and c,d as local. – Sadhana Rayan Feb 17 '15 at 06:12