2

I'm trying to implement a gaussian filter for images from a code I found online, using Python and PyOpenCL. My original images are numpy arrays, but I am confused as in which should I use to pass the images to the GPU.

Originally, the kernel receives OpenCL Images as input. This works fine and the kernel runs properly, however, I have not found a way to convert the output of the GPU computation (also an OpenCL Image) to a numpy array. This is needed, since I will have to carry out other computations after running the GPU filter.

I tried using pyOpenCL Array, but had 2 problems in that case:

  1. Did not know how to tell the kernel that the input would be an Array, since it is a pyOpenCL data structure, not a OpenCL one.
  2. Did not find an equivalent of read_imagef to be used on pyOpenCL Arrays, and I use that function in my kernel.
  3. Could not get the GPU result copied back to the host. I would keep getting a 'cl_array does not have module get()' error.

I would like to know:

  1. Is there a way to tell the kernel that it will receive an array, just as I use image2d_t to say that the input is an Image?
  2. What could I use as an equivalent to OpenCL's read_imagef for pyOpenCL Arrays?

Thanks a lot in advance. Kernel code below:

Kernel:

__kernel void gaussian(__read_only image2d_t inputImage,
                        __read_only image2d_t filterImage,
                        __write_only image2d_t outputImage,
                        const int nInWidth,
                        const int nFilterWidth){

const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;

const int xOut = get_global_id(0);
const int yOut = get_global_id(1);

float4 sum = (float4)(0.0, 0.0, 0.0, 1.0);

for(int r = 0; r < nFilterWidth; r++){
    for(int c = 0; c < nFilterWidth; c++){

        int2 location = (xOut + r, yOut + c);

        float4 filterVal = read_imagef(filterImage, sampler, location);
        float4 inputVal = read_imagef(inputImage, sampler, location);

        sum.x += filterVal.x * inputVal.x;
        sum.y += filterVal.y * inputVal.y;
        sum.z += filterVal.z * inputVal.z;
        sum.w = 1.0;
    }
}

int2 outLocation = (xOut, yOut);
write_imagef(outputImage, outLocation, sum);
}

2 Answers2

2

This is a complex question and as I had the same issues I want to try answering them in detail. Let's break down your issues to smaller parts to see what's going on.

Datatypes

You seem to confuse some of the datatypes which each other. OpenCL by itself uses images or arrays, a pyopenCL array maps to an array in OpenCL, same does the pyopenCL image to an OpenCL image. Mixing these two will work in some special cases, but overall, it's not a good idea.

Data Accessing

An image in OpenCL needs a sampler to read from it. An array can be accessed by simple coordinate access as in python. (See here or here for more about the problems I had there...).

Movement

Everything you move in OpenCL using pyopencl has it's own copy functions. So, to move an image or array from device to host, be sure to enqueue the corresponding copy function to the queue in your context.

Community
  • 1
  • 1
Dschoni
  • 3,714
  • 6
  • 45
  • 80
0

The underlying OpenCl data structure of pyopencl.Array is the so called buffer. You can retrieve the buffer object via the base_data attribute of the Array (see the docs). The buffer can be passed in a kernel call, however the kernel has to be adjusted to handle buffers not images (change the kernel argument type to __global float* inputImage etc., access elements as in regular multidimensional array indexing).

Anyway, the PyOpenCL Array class is designed to write code using numpy style that will be executed on the device. This does not require you to write any kernel code yourself anymore. Instead, you could do something like this:

import pyopencl as cl
input_array = cl.array.to_device(queue, input_numpy_array)
filter_array = cl.array.to_device(queue, filter_numpy_array)
output_array = cl.array.zeros_like(input_array)
# half height and half width of filter
fhh, fhw = filter_array.shape[0] // 2, filter_array.shape[1] // 2
for y in range(input_array.shape[0]):
    for x in range(input_array.shape[1]):
            patch = input_array[y-fhh:y+fhh+1, x-fhw:x+fhw+1]
        sum = cl.array.sum(patch * filter_array)
        output_array[y, x] = sum
output_numpy_array = output_array.get()

Note that I assumed using a single-channel (gray) image. Also I did not test the code above but I assume the implementation to be horribly ineffective. Edge handling is not covered.

Finally, you should consider not using PyOpenCl Arrays, given your kernel. Create pyopencl.Image objects from your numpy arrays and pass them in the kernel call. This way, you don't have to modify your kernel.

pylipp
  • 181
  • 3
  • 15