OpenCL can not detect my AMD GPU using OpenCV

Question

I am using AMD Radeon R9 M375. I tried following this answer https://stackoverflow.com/a/34250412/8731839 but it didn't work for me.

I followed this: http://answers.opencv.org/question/108646/opencl-can-not-detect-my-nvidia-gpu-via-opencv/?answer=108784#post-id-108784

Here is my output from clinfo.exe

  Platform Name:                 AMD Accelerated Parallel Processing
Number of devices:               2
  Device Type:                   CL_DEVICE_TYPE_GPU
  Vendor ID:                     1002h
  Board name:                    AMD Radeon (TM) R9 M375
  Device Topology:               PCI[ B#4, D#0, F#0 ]
  Max compute units:                 10
  Max work items dimensions:             3
    Max work items[0]:               256
    Max work items[1]:               256
    Max work items[2]:               256
  Max work group size:               256
  Preferred vector width char:           4
  Preferred vector width short:          2
  Preferred vector width int:            1
  Preferred vector width long:           1
  Preferred vector width float:          1
  Preferred vector width double:         1
  Native vector width char:          4
  Native vector width short:             2
  Native vector width int:           1
  Native vector width long:          1
  Native vector width float:             1
  Native vector width double:            1
  Max clock frequency:               1015Mhz
  Address bits:                  32
  Max memory allocation:             3019898880
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                16384
  Max image 2D height:               16384
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            16
  Max size of kernel argument:           1024
  Alignment (bits) of base address:      2048
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     No
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    Read/Write
  Cache line size:               64
  Cache size:                    16384
  Global memory size:                3221225472
  Constant buffer size:              65536
  Max number of constant args:           8
  Local memory type:                 Scratchpad
  Local memory size:                 32768
  Max pipe arguments:                0
  Max pipe active reservations:          0
  Max pipe packet size:              0
  Max global variable size:          0
  Max global variable preferred total size:  0
  Max read/write image args:             0
  Max on device events:              0
  Queue on device max size:          0
  Max on device queues:              0
  Queue on device preferred size:        0
  SVM capabilities:              
    Coarse grain buffer:             No
    Fine grain buffer:               No
    Fine grain system:               No
    Atomics:                     No
  Preferred platform atomic alignment:       0
  Preferred global atomic alignment:         0
  Preferred local atomic alignment:      0
  Kernel Preferred work group size multiple:     64
  Error correction support:          0
  Unified memory for Host and Device:        0
  Profiling timer resolution:            1
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:                
    Execute OpenCL kernels:          Yes
    Execute native function:             No
  Queue on Host properties:              
    Out-of-Order:                No
    Profiling :                  Yes
  Queue on Device properties:                
    Out-of-Order:                No
    Profiling :                  No
  Platform ID:                   00007FFF209D0188
  Name:                      Capeverde
  Vendor:                    Advanced Micro Devices, Inc.
  Device OpenCL C version:           OpenCL C 1.2 
  Driver version:                2348.3
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 1.2 AMD-APP (2348.3)
  Extensions:                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics 

cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing 

cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing 

cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event cl_amd_liquid_flash 


      Device Type:                   CL_DEVICE_TYPE_CPU
      Vendor ID:                     1002h
      Board name:                    
      Max compute units:                 4
      Max work items dimensions:             3
        Max work items[0]:               1024
        Max work items[1]:               1024
        Max work items[2]:               1024
      Max work group size:               1024
      Preferred vector width char:           16
      Preferred vector width short:          8
      Preferred vector width int:            4
      Preferred vector width long:           2
      Preferred vector width float:          8
      Preferred vector width double:         4
      Native vector width char:          16
      Native vector width short:             8
      Native vector width int:           4
      Native vector width long:          2
      Native vector width float:             8
      Native vector width double:            4
      Max clock frequency:               2200Mhz
      Address bits:                  64
      Max memory allocation:             2147483648
      Image support:                 Yes
      Max number of images read arguments:       128
      Max number of images write arguments:      64
      Max image 2D width:                8192
      Max image 2D height:               8192
      Max image 3D width:                2048
      Max image 3D height:               2048
      Max image 3D depth:                2048
      Max samplers within kernel:            16
      Max size of kernel argument:           4096
      Alignment (bits) of base address:      1024
      Minimum alignment (bytes) for any datatype:    128
      Single precision floating point capability
        Denorms:                     Yes
        Quiet NaNs:                  Yes
        Round to nearest even:           Yes
        Round to zero:               Yes
        Round to +ve and infinity:           Yes
        IEEE754-2008 fused multiply-add:         Yes
      Cache type:                    Read/Write
      Cache line size:               64
      Cache size:                    32768
      Global memory size:                8499593216
      Constant buffer size:              65536
      Max number of constant args:           8
      Local memory type:                 Global
      Local memory size:                 32768
      Max pipe arguments:                16
      Max pipe active reservations:          16
      Max pipe packet size:              2147483648
      Max global variable size:          1879048192
      Max global variable preferred total size:  1879048192
      Max read/write image args:             64
      Max on device events:              0
      Queue on device max size:          0
      Max on device queues:              0
      Queue on device preferred size:        0
      SVM capabilities:              
        Coarse grain buffer:             No
        Fine grain buffer:               No
        Fine grain system:               No
        Atomics:                     No
      Preferred platform atomic alignment:       0
      Preferred global atomic alignment:         0
      Preferred local atomic alignment:      0
      Kernel Preferred work group size multiple:     1
      Error correction support:          0
      Unified memory for Host and Device:        1
      Profiling timer resolution:            465
      Device endianess:              Little
      Available:                     Yes
      Compiler available:                Yes
      Execution capabilities:                
        Execute OpenCL kernels:          Yes
        Execute native function:             Yes
      Queue on Host properties:              
        Out-of-Order:                No
        Profiling :                  Yes
      Queue on Device properties:                
        Out-of-Order:                No
        Profiling :                  No
      Platform ID:                   00007FFF209D0188
      Name:                      Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
      Vendor:                    GenuineIntel
      Device OpenCL C version:           OpenCL C 1.2 
      Driver version:                2348.3 (sse2,avx)
      Profile:                   FULL_PROFILE
      Version:                   OpenCL 1.2 AMD-APP (2348.3)

What works:

std::vector<cv::ocl::PlatformInfo> platforms;
cv::ocl::getPlatfomsInfo(platforms);

//OpenCL Platforms
for (size_t i = 0; i < platforms.size(); i++)
{

    //Access to Platform
    const cv::ocl::PlatformInfo* platform = &platforms[i];

    //Platform Name
    std::cout << "Platform Name: " << platform->name().c_str() << "\n";
    //Access Device within Platform
    cv::ocl::Device current_device;
    for (int j = 0; j < platform->deviceNumber(); j++)
    {
        //Access Device
        platform->getDevice(current_device, j);
        //Device Type
        int deviceType = current_device.type();
        cout << "Device Number: " << platform->deviceNumber() << endl;
        cout << "Device Type: " << deviceType << endl;
    }
}

The above code displays

 Platform Name: Intel(R) OpenCL
 Device Number: 2
 Device Type: 2
 Device Number: 2
 Device Type: 4 
 Platform Name: AMD Accelerated Parallel Processing
 Device Number: 2
 Device Type: 4 
 Device Number: 2
 Device Type: 2

How do I go about making a Context from here using AMD as my GPU? The linked post says to use method initializeContextFromHandlerbut the documentation on OpenCV is not sufficient enough. Documentation Link

Include more info about what exactly you did and what error messages you got, to make this a [mcve]. — Peter Cordes, Feb 26 '18 at 23:59
Do you want to run custom kernels or just the OpenCL accelerated opencv parts? because if the latter you shouldn't be setting any context and using https://opencv.org/platforms/opencl.html — aram, Feb 27 '18 at 12:41
@Aram I want to accelerate my array(UMat) manipulation functions using OpenCL. I am aware about the page you linked and my code uses that format only but when I run it, it shows that my Intel HD Graphics are being used instead of AMD. I want to switch to AMD for particular parts of my code. I use `setUseOpenCL(bool foo)` to switch between CPU and GPU. — Shubham, Feb 27 '18 at 13:12
have a look at https://github.com/opencv/opencv/issues/6926 I never used it, but it looks like they create a context and set a device in the commented answer — Micka, Feb 27 '18 at 13:15
I tried it but it didn't work for me. I think the problem is that `cv::ocl::Device::TYPE_GPU` doesn't detect AMD. I think if I have to use AMD, I will have to use `initializeContextFromHandler` function in `ocl.hpp` file but I am running into trouble as to what information it actually needs. I can't understand its documentation and feel it is insufficient. — Shubham, Feb 27 '18 at 13:38

score 1 · Accepted Answer · answered Mar 01 '18 at 14:44

Issue is resolved. I don't know what I did but AMD is working now.

Current settings (On Windows):

Environment Variable:

Name: OPENCV_OPENCL_DEVICE

Value: AMD:GPU:Capeverde

Using setUseOpenCL(bool foo) present in ocl.hpp to select whether to use GPU or CPU.

Most likely problem: In my actual code, I wasn't doing any computation but when I wrote a simple code for subtraction of two matrices, AMD started working.

Code:

#include <opencv2/core/ocl.hpp>
#include <opencv2/opencv.hpp>

int main() {
    cv::UMat mat1 = cv::UMat::ones(10, 10, CV_32F);
    cv::UMat mat2 = cv::UMat::zeros(10, 10, CV_32F);
    cv::UMat output = cv::UMat(10, 10, CV_32F);
    cv::subtract(mat1, mat2, output);
    std::cout << output << "\n";
    std::getchar();
}

OpenCL can not detect my AMD GPU using OpenCV

1 Answers1