12

Is there a way to get OpenCL to give me a list of all unique physical devices which have an OpenCL implementation available? I know how to iterate through the platform/device list but for instance, in my case, I have one Intel-provided platform which gives me an efficient device implementation for my CPU, and the APP platform which provides a fast implementation for my GPU but a terrible implementation for my CPU.

Is there a way to work out that the two CPU devices are in fact the same physical device, so that I can choose the most efficient one and work with that, instead of using both and having them contend with each other for compute time on the single physical device?

I have looked at CL_DEVICE_VENDOR_ID and CL_DEVICE_NAME but they don't solve my issues, the CL_DEVICE_NAME will be the same for two separate physical devices of the same model (dual GPU's) and CL_DEVICE_VENDOR_ID gives me a different ID for my CPU depending on the platform.

An ideal solution would be some sort of unique physical device ID, but I'd be happy with manually altering the OpenCL configuration to rearrange the devices myself (if such a thing is possible).

user
  • 5,335
  • 7
  • 47
  • 63
Thomas
  • 3,321
  • 1
  • 21
  • 44
  • i don't get the question.. so you want to choose between two CPUs with the identical specs? – ardiyu07 Jun 02 '12 at 02:14
  • I want to use all available physical devices (for an easily parallelizable problem) - and I want to only use a single logical device by physical device otherwise I get contention. – Thomas Jun 02 '12 at 02:31

5 Answers5

5

As far as I could investigate the issue now, there is no reliable solution. If all your work is done within a single process, you may use the order of entries returned by clGetDeviceIDs or cl_device values themselves (essentially they're pointers), but things get worse if you try to share those identifiers between processes.

See that guy's blog post about it, saying:

The issue is that if you have two identical GPUs, you can’t distinguish between them. If you call clGetDeviceIDs, the order in which they are returned is actually unspecified, so if the first process picks the first device and the second takes the second device, they both may wind up oversubscribing the same GPU and leaving the other one idle.

However, he notes that nVidia and AMD provide their custom extensions, cl_amd_device_topology and cl_nv_device_attribute_query. You may check whether these extensions are supported by your device, and then use them as the following (the code by original author):

// This cl_ext is provided as part of the AMD APP SDK
#include <CL/cl_ext.h>

cl_device_topology_amd topology;
status = clGetDeviceInfo (devices[i], CL_DEVICE_TOPOLOGY_AMD,
    sizeof(cl_device_topology_amd), &topology, NULL);

if(status != CL_SUCCESS) {
    // Handle error
}

if (topology.raw.type == CL_DEVICE_TOPOLOGY_TYPE_PCIE_AMD) {
    std::cout << "INFO: Topology: " << "PCI[ B#" << (int)topology.pcie.bus
        << ", D#" << (int)topology.pcie.device << ", F#"
        << (int)topology.pcie.function << " ]" << std::endl;
}

or (code by me, adapted from the above linked post):

#define CL_DEVICE_PCI_BUS_ID_NV  0x4008
#define CL_DEVICE_PCI_SLOT_ID_NV 0x4009

cl_int bus_id;
cl_int slot_id;

status = clGetDeviceInfo (devices[i], CL_DEVICE_PCI_BUS_ID_NV,
    sizeof(cl_int), &bus_id, NULL);
if (status != CL_SUCCESS) {
    // Handle error.
}

status = clGetDeviceInfo (devices[i], CL_DEVICE_PCI_BUS_ID_NV,
    sizeof(cl_int), &slot_id, NULL);
if (status != CL_SUCCESS) {
    // Handle error.
}

std::cout << "Topology = [" << bus_id <<
                         ":"<< slot_id << "]" << std::endl;
firegurafiku
  • 3,017
  • 1
  • 28
  • 37
  • "the order in which they are returned is actually unspecified" wow! that's even worse than I expected. Anyway my question was not so much about multiple processes but more about different platforms exposing the same physical device (e.g. the intel SDK and the AMD SDK both exposing the same main CPU as a logical device in each of their respective platforms) but this topology extension resolves that as well. Thanks for the answer! – Thomas Mar 16 '16 at 05:09
  • @Thomas: Your're welcome! BTW, `clinfo` program should display topology identifiers for both nVidia and AMD devices. You definitely should [have a look](https://github.com/Oblomov/clinfo/blob/f9516865c0a47d2e2b24eb8371f0931792a23316/src/clinfo.c#L1048) how they deal with it, their code seems to be better than mine. – firegurafiku Mar 16 '16 at 15:01
3
  • If you have two devices of the exact same kind belonging to a platform, you can tell them apart by the associated cl_device_ids return by clGetDeviceIDs.

  • If you have devices that can be used by two different platforms you can eliminate the entries for the second platform by comparing the device names from CL_DEVICE_NAME.

  • If you want to find the intended platform for a device, compare the CL_PLATFORM_VENDOR and CL_DEVICE_VENDOR strings from clGetPlatformInfo() and clGetDeviceInfo respectively.

You can read in all platforms and all their associated devices into separate platform-specific lists and then eliminate doubles by comparing the device names in the separate lists. This should ensure that you do not get the same device for different platforms.

Finally you can, by command line arguments or configuration file for example, give arguments to your application to associate devices of a certain type (CPU, GPU, Accelerator) with a specific platform if there exists a choice of different platforms for a device type. Hopefully this will answer your question.

Steinin
  • 541
  • 7
  • 20
0

anyway let's just assume that you are trying to pull the unique id for all devices, actually you can just simply query with clGetDeviceIDs:

cl_int clGetDeviceIDs(cl_platform_id platform,
                      cl_device_type device_type,
                      cl_uint num_entries,
                      cl_device_id *devices,
                      cl_uint *num_devices)

then your list of device will be inserted to the *devices array, and then you can do clGetDeviceInfo() to find out which device you'd like to use.

ardiyu07
  • 1,790
  • 2
  • 17
  • 29
  • I want to use all of them but I don't want any physical device to be accessed by multiple logical devices. – Thomas Jun 02 '12 at 02:31
  • If i'm not getting it wrong you want to do multithreads with all devices running at the same time? if so then you may wanna take a look at CUDA's Computing SDK for OpenCL, the oclMultiThreads source code, where you can divide up the work manually and then run them asynchronously with the available devices – ardiyu07 Jun 02 '12 at 02:36
  • No, my question is more subtle than that. I know I could list all devices and multithread them. But the issue is that a single physical device (say, my unique CPU) comes up as two logical devices (one in each OpenCL platform) - multithreading over the two logical devices will cause resource contention over the unique physical CPU (this is even more true for GPU's) so I want to detect that the two logical devices point to the same physical device and only use one of them. – Thomas Jun 02 '12 at 05:37
  • Ah i see finally i got your question. I don't know how you partitioned your device and telling them to running the same opencl program, but there is also a way to divide the device into sub-devices with an OpenCL extension, and you can take a look at the description here: http://www.khronos.org/registry/cl/extensions/ext/cl_ext_device_fission.txt. I think it supports both Intel and AMD, but I dont guarantee it's compatible with your environment. – ardiyu07 Jun 02 '12 at 05:47
0

Combining answers above, my solution was:

long bus = 0; // leave it 0 for Intel
// update bus for NVIDIA/AMD ...
// ...
long uid = (bus << 5) | device_type;

Variable bus was computed according NVIDIA/AMD device-specific info queries, as mentioned firegurafiku, variable device_type was result of clGetDeviceInfo(clDevice, CL_DEVICE_TYPE, sizeof(cl_device_type), &device_type, nullptr) API call, as Steinin suggested.

Such approach solved issue of having equal unique ID for Intel CPU with integrated GPU. Now both devices have unique identifiers, thank to different CL_DEVICE_TYPE's.

Surprizingly, the case of running code on Oclgrind-emulated device, Oclgrind simulator device also gets unique identifier 15, disctinct from any other on my system.

The only case when proposed approach can fail - several CPUs of same model on a single mainboard.

Mykyta Kozlov
  • 413
  • 3
  • 14
  • 1
    Now things are a bit more simplified, with 2 Khronos extensions, if device supports them. The first one is `cl_khr_device_uuid`, which returns unique uid as required. The last one is `cl_khr_pci_bus_info` which provides platfom-agnostic way to extract PCI bus info. Intel core i7 6700 hq with intel hd graphics with latest drivers supports `cl_khr_pci_bus_info`. Nvidia geforce 940 mx with latest drivers supports `cl_khr_device_uuid`. – Mykyta Kozlov Dec 24 '21 at 15:29
0

Benchmark each card for some value like gflops or pixels per second. Then do it in pairs. If any pair loses their normal performance to half the value or if their sum equals to one of them maximum, then they are the same physical device. Each benchmark could be taken for few miliseconds and even a 40-GPU system would take only few seconds to complete (brute-forcing all pairs for 1600 times). (in case clGetDeviceIDs fails at some point)

huseyin tugrul buyukisik
  • 11,469
  • 4
  • 45
  • 97