0

When I use Aparapi with an AMD Radeon R7 450 graphics card with older drivers installed, the maximum value of the size parameter in the code below can be 268,435,455. Which corresponds to the 2D image size 16384 X 16384 = 268,435,456 (screenshot below). When I usually use the AMD Radeon RX 5700 XT graphics card, which also has a 2D image size of 16384 X 16384 = 268,435,456, I get this error: Total Local Kernel Size: Exceeds Maximum Allowed Local Kernel Size: 256 failed [ERROR] Failed to execute command That is size value cannot be greater than 256. Same issue with NVIDIA GeForce RTX 3060 Ti even though it has a 2D image size of 32768 X 32768 = 1,073,741,824. Tell me what could be the problem? Why is code performance lower in this case on newer video cards?

Code:

        int size = 268435455;
        double[] a = new double[size];
        double[] b = new double[size];
        double[] c = new double[size];

        for (int i = 0; i < size; i++)
        {
            a[i] = i;
            b[i] = i;
        }

        Kernel kernel = new Kernel()
        {
            @Override 
            public void run() 
            {
                int gid = getGlobalId();                
                c[gid] = a[gid] + b[gid];
            }            
        };

        kernel.execute(size);          
        kernel.dispose();

screen GPU Caps Viewer AMD Radeon R7 450: enter image description here AMD Radeon RX 5700 XT: enter image description here NVIDIA GeForce RTX 3060 Ti: enter image description here

ADDITION:

We managed to work around this problem in this way. If you use this code, the size can be greater than 256

Range range = Range.create2D(size, 1);
kernel.execute(range);
kernel.dispose();

But if the array is two-dimensional, as in this case, then for AMD Radeon RX 5700 XT Range.create 2D(size, 1) works with a size greater than 256, but not for NVIDIA GeForce RTX 3060 Ti. For NVIDIA GeForce RTX 3060 Ti still size more than 256 does not work.

        Kernel kernel = new Kernel()
        {
            @Override 
            public void run() 
            {
                int gid = getGlobalId();
                
                for (int i = 0; i < size; i++)
                {                  
                    c[i][gid] = a[i][gid] + b[i][gid];
                }
            } 
        }

Perhaps, for Aparapi AMD video cards are preferable to NVIDIA. Keep this in mind when buying a video card.

forreg16
  • 21
  • 2
  • It sounds like the problem here is that you're confusing maximum **image** size with maximum **work-group** size. I'm unfortunately not familiar with the OpenCL wrapper you're using, but make sure you don't try to make the group size the same as the total work size when enqueuing. – pmdj Mar 17 '23 at 14:18
  • @pmdj I think you are right. **work-group** is called **localSize** if I'm not mistaken. Aparapi automatically chooses the best localSize. You can set it manually with `create(size, localSize)` as long as `size % localSize == 0`. In my case I have size of 10240 and the chosen localSize is 640. But when Aparapi tries to run the code I get the error: `!!!!!!! Kernel overall local size: 640 exceeds maximum kernel allowed local size of: 256 failed (null)` The question is: Can we somehow change this maximum kernel allowed local size? – Trayan Momkov Mar 19 '23 at 16:03
  • The local size is a limitation of the hardware & driver, and also the resource usage of the kernel itself. You don’t need any work groups at all (group size 1) if you don’t use local memory. This won’t have any negative performance impact. – pmdj Mar 20 '23 at 08:11
  • The localSize matters. 256 is definitely faster than 1 in my case. – Trayan Momkov Apr 09 '23 at 13:30

0 Answers0