0

I am using AMD Radeon R9 M375. I tried following this answer https://stackoverflow.com/a/34250412/8731839 but it didn't work for me.

I followed this: http://answers.opencv.org/question/108646/opencl-can-not-detect-my-nvidia-gpu-via-opencv/?answer=108784#post-id-108784

Here is my output from clinfo.exe

  Platform Name:                 AMD Accelerated Parallel Processing
Number of devices:               2
  Device Type:                   CL_DEVICE_TYPE_GPU
  Vendor ID:                     1002h
  Board name:                    AMD Radeon (TM) R9 M375
  Device Topology:               PCI[ B#4, D#0, F#0 ]
  Max compute units:                 10
  Max work items dimensions:             3
    Max work items[0]:               256
    Max work items[1]:               256
    Max work items[2]:               256
  Max work group size:               256
  Preferred vector width char:           4
  Preferred vector width short:          2
  Preferred vector width int:            1
  Preferred vector width long:           1
  Preferred vector width float:          1
  Preferred vector width double:         1
  Native vector width char:          4
  Native vector width short:             2
  Native vector width int:           1
  Native vector width long:          1
  Native vector width float:             1
  Native vector width double:            1
  Max clock frequency:               1015Mhz
  Address bits:                  32
  Max memory allocation:             3019898880
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                16384
  Max image 2D height:               16384
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            16
  Max size of kernel argument:           1024
  Alignment (bits) of base address:      2048
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     No
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    Read/Write
  Cache line size:               64
  Cache size:                    16384
  Global memory size:                3221225472
  Constant buffer size:              65536
  Max number of constant args:           8
  Local memory type:                 Scratchpad
  Local memory size:                 32768
  Max pipe arguments:                0
  Max pipe active reservations:          0
  Max pipe packet size:              0
  Max global variable size:          0
  Max global variable preferred total size:  0
  Max read/write image args:             0
  Max on device events:              0
  Queue on device max size:          0
  Max on device queues:              0
  Queue on device preferred size:        0
  SVM capabilities:              
    Coarse grain buffer:             No
    Fine grain buffer:               No
    Fine grain system:               No
    Atomics:                     No
  Preferred platform atomic alignment:       0
  Preferred global atomic alignment:         0
  Preferred local atomic alignment:      0
  Kernel Preferred work group size multiple:     64
  Error correction support:          0
  Unified memory for Host and Device:        0
  Profiling timer resolution:            1
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:                
    Execute OpenCL kernels:          Yes
    Execute native function:             No
  Queue on Host properties:              
    Out-of-Order:                No
    Profiling :                  Yes
  Queue on Device properties:                
    Out-of-Order:                No
    Profiling :                  No
  Platform ID:                   00007FFF209D0188
  Name:                      Capeverde
  Vendor:                    Advanced Micro Devices, Inc.
  Device OpenCL C version:           OpenCL C 1.2 
  Driver version:                2348.3
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 1.2 AMD-APP (2348.3)
  Extensions:                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics 

cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing 

cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing 

cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event cl_amd_liquid_flash 


      Device Type:                   CL_DEVICE_TYPE_CPU
      Vendor ID:                     1002h
      Board name:                    
      Max compute units:                 4
      Max work items dimensions:             3
        Max work items[0]:               1024
        Max work items[1]:               1024
        Max work items[2]:               1024
      Max work group size:               1024
      Preferred vector width char:           16
      Preferred vector width short:          8
      Preferred vector width int:            4
      Preferred vector width long:           2
      Preferred vector width float:          8
      Preferred vector width double:         4
      Native vector width char:          16
      Native vector width short:             8
      Native vector width int:           4
      Native vector width long:          2
      Native vector width float:             8
      Native vector width double:            4
      Max clock frequency:               2200Mhz
      Address bits:                  64
      Max memory allocation:             2147483648
      Image support:                 Yes
      Max number of images read arguments:       128
      Max number of images write arguments:      64
      Max image 2D width:                8192
      Max image 2D height:               8192
      Max image 3D width:                2048
      Max image 3D height:               2048
      Max image 3D depth:                2048
      Max samplers within kernel:            16
      Max size of kernel argument:           4096
      Alignment (bits) of base address:      1024
      Minimum alignment (bytes) for any datatype:    128
      Single precision floating point capability
        Denorms:                     Yes
        Quiet NaNs:                  Yes
        Round to nearest even:           Yes
        Round to zero:               Yes
        Round to +ve and infinity:           Yes
        IEEE754-2008 fused multiply-add:         Yes
      Cache type:                    Read/Write
      Cache line size:               64
      Cache size:                    32768
      Global memory size:                8499593216
      Constant buffer size:              65536
      Max number of constant args:           8
      Local memory type:                 Global
      Local memory size:                 32768
      Max pipe arguments:                16
      Max pipe active reservations:          16
      Max pipe packet size:              2147483648
      Max global variable size:          1879048192
      Max global variable preferred total size:  1879048192
      Max read/write image args:             64
      Max on device events:              0
      Queue on device max size:          0
      Max on device queues:              0
      Queue on device preferred size:        0
      SVM capabilities:              
        Coarse grain buffer:             No
        Fine grain buffer:               No
        Fine grain system:               No
        Atomics:                     No
      Preferred platform atomic alignment:       0
      Preferred global atomic alignment:         0
      Preferred local atomic alignment:      0
      Kernel Preferred work group size multiple:     1
      Error correction support:          0
      Unified memory for Host and Device:        1
      Profiling timer resolution:            465
      Device endianess:              Little
      Available:                     Yes
      Compiler available:                Yes
      Execution capabilities:                
        Execute OpenCL kernels:          Yes
        Execute native function:             Yes
      Queue on Host properties:              
        Out-of-Order:                No
        Profiling :                  Yes
      Queue on Device properties:                
        Out-of-Order:                No
        Profiling :                  No
      Platform ID:                   00007FFF209D0188
      Name:                      Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
      Vendor:                    GenuineIntel
      Device OpenCL C version:           OpenCL C 1.2 
      Driver version:                2348.3 (sse2,avx)
      Profile:                   FULL_PROFILE
      Version:                   OpenCL 1.2 AMD-APP (2348.3)

What works:

std::vector<cv::ocl::PlatformInfo> platforms;
cv::ocl::getPlatfomsInfo(platforms);

//OpenCL Platforms
for (size_t i = 0; i < platforms.size(); i++)
{

    //Access to Platform
    const cv::ocl::PlatformInfo* platform = &platforms[i];

    //Platform Name
    std::cout << "Platform Name: " << platform->name().c_str() << "\n";
    //Access Device within Platform
    cv::ocl::Device current_device;
    for (int j = 0; j < platform->deviceNumber(); j++)
    {
        //Access Device
        platform->getDevice(current_device, j);
        //Device Type
        int deviceType = current_device.type();
        cout << "Device Number: " << platform->deviceNumber() << endl;
        cout << "Device Type: " << deviceType << endl;
    }
}

The above code displays

 Platform Name: Intel(R) OpenCL
 Device Number: 2
 Device Type: 2
 Device Number: 2
 Device Type: 4 
 Platform Name: AMD Accelerated Parallel Processing
 Device Number: 2
 Device Type: 4 
 Device Number: 2
 Device Type: 2 

How do I go about making a Context from here using AMD as my GPU? The linked post says to use method initializeContextFromHandlerbut the documentation on OpenCV is not sufficient enough. Documentation Link

Shubham
  • 153
  • 1
  • 2
  • 9
  • Include more info about what exactly you did and what error messages you got, to make this a [mcve]. – Peter Cordes Feb 26 '18 at 23:59
  • @PeterCordes done. – Shubham Feb 27 '18 at 00:34
  • can you add a cout for platform->deviceNumber() and type? – Micka Feb 27 '18 at 05:10
  • Do you want to run custom kernels or just the OpenCL accelerated opencv parts? because if the latter you shouldn't be setting any context and using https://opencv.org/platforms/opencl.html – aram Feb 27 '18 at 12:41
  • @Micka added more info. – Shubham Feb 27 '18 at 13:08
  • @Aram I want to accelerate my array(UMat) manipulation functions using OpenCL. I am aware about the page you linked and my code uses that format only but when I run it, it shows that my Intel HD Graphics are being used instead of AMD. I want to switch to AMD for particular parts of my code. I use `setUseOpenCL(bool foo)` to switch between CPU and GPU. – Shubham Feb 27 '18 at 13:12
  • have a look at https://github.com/opencv/opencv/issues/6926 I never used it, but it looks like they create a context and set a device in the commented answer – Micka Feb 27 '18 at 13:15
  • I tried it but it didn't work for me. I think the problem is that `cv::ocl::Device::TYPE_GPU` doesn't detect AMD. I think if I have to use AMD, I will have to use `initializeContextFromHandler` function in `ocl.hpp` file but I am running into trouble as to what information it actually needs. I can't understand its documentation and feel it is insufficient. – Shubham Feb 27 '18 at 13:38

1 Answers1

1

Issue is resolved. I don't know what I did but AMD is working now.

Current settings (On Windows):

  1. Environment Variable:

    Name: OPENCV_OPENCL_DEVICE
    
    Value: AMD:GPU:Capeverde
    
  2. Using setUseOpenCL(bool foo) present in ocl.hpp to select whether to use GPU or CPU.

Most likely problem: In my actual code, I wasn't doing any computation but when I wrote a simple code for subtraction of two matrices, AMD started working.

Code:

#include <opencv2/core/ocl.hpp>
#include <opencv2/opencv.hpp>

int main() {
    cv::UMat mat1 = cv::UMat::ones(10, 10, CV_32F);
    cv::UMat mat2 = cv::UMat::zeros(10, 10, CV_32F);
    cv::UMat output = cv::UMat(10, 10, CV_32F);
    cv::subtract(mat1, mat2, output);
    std::cout << output << "\n";
    std::getchar();
}
Shubham
  • 153
  • 1
  • 2
  • 9