0

The program is an Android application and needs to progress an mp4 image. The initialization can be found in this question: init code

But I found out that the init code works with kernels who do not use for loops. When I have a nested for loop as in the example on the bottom of the post, only the first image will be processed and all the others will be black.

Simple example:

1)
init function
execute kernel with for loops
remove opencl function

=> all goes fine

2)
init function 
execute kernel **without** for loops
execute kernel **without** for loops on next frame
remove opencl function

=> all goes fine

3)
init function
execute kernel **with** for loops
execute kernel **with** for loops on next frame
remove opencl function

=> first frame is processed, 2th frame is black

edge kernel:

__kernel void edgeKernel(__read_only  image2d_t  srcImage,
                          __write_only image2d_t  dstImage)
{    
    const sampler_t sampler = CLK_NORMALIZED_COORDS_TRUE |
                               CLK_ADDRESS_REPEAT         |
                               CLK_FILTER_NEAREST;
    int x = get_global_id(0);
    int y = get_global_id(1);
    int2 coords = (int2) (x,y);

    int i = 0;
    int j = 0;
    float4 bufferPixel,currentPixel;
    float sum = 0;
    int counter = 0;
    const float edgeKernel[9] = {0.0f,1.0f,0.0f,1.0f,-4.0f,1.0f,0.0f,1.0f,0.0f};
    currentPixel = read_imagef(srcImage,sampler,coords);
    for(i=-1;i<=1;i++)
    {
        for(j=-1;j<=1;j++)
        {
        coords = (int2)((x+i),(y+j));
        bufferPixel = read_imagef(srcImage,sampler,coords);
        //sum = sum + (bufferPixel.y * edgeKernel[counter]);
        sum = mad(bufferPixel.y,edgeKernel[counter],sum);
        counter++;
        }
    }
    if(sum>255) sum=255;
    if(sum<0) sum=0;

    currentPixel.x=sum;
    currentPixel.y=sum;
    currentPixel.z=sum;

    write_imagef(dstImage,coords,currentPixel);                           
}

I can not see what's wrong, I guess it's something with overlapping memory allocation but I'm clueless on what the solution might be.

Edit1:

OpenCL code is the link to the OpenCL code. Look for the initOpenCL and nativeImage2DOpenCL function. You should be able to find everything you need there.

Community
  • 1
  • 1
DeGoosseZ
  • 628
  • 5
  • 20
  • Are you waiting till the first one finishes before putting the second frame in memory? Since I don't see any reason why the kernel will not work in a 2nd run, since the code is ok, and it works for the first frame. I think your problem is how you run the kernels, not in the kernel itself. – DarkZeros May 30 '14 at 12:37
  • If one frame is processed correctly, that means that kernel is ok. How are you getting frames from decoder? – Roman Arzumanyan May 30 '14 at 15:25
  • DarkZeros: I've added a link to the OpenCL code. All the OpenCL code can be found there. If you need anything more specific, just ask and I'll provide. Roman: the decoder should be right, if I try with my inverse kernel it processes everything. The problem only appears when I use a for loop in my opencl kernel. – DeGoosseZ May 30 '14 at 15:42
  • I did not really solve it, not for all the cases (I have multiple filters that are called with the same code). I went to the OpenCL IRC channel (lot's of helpfull people there) and found someone (nick: sharpneli) who helped me debugging and we concluded that it must be the driver quality that is not 100%. So you can try and change things in your kernel and test again, with try and error and hope that you can find the kernel your driver likes the most. A good thing to try is to turn off all the compiler optimalisations. It helped my for one kernel.. – DeGoosseZ Jul 08 '14 at 22:49

0 Answers0