Is cudaSetDevice() numbering consistent across processes?

Question

I want to call cudaGetDeviceCount(&N) in a parent process, then create N child processes, one per GPU found, and pass to each process (via command-line) a unique GPU number, so that effectively there will be one (and only one) process handling each GPU. I plan to call cudaSetDevice(i) in each process, with i received from the command line.

However, I got a doubt that e.g. GPU #3 in one process may be GPU #4 in another process, while GPU #3 in the latter process is something completely different like GPU #1 from the former process, etc.

Do you know if the GPU numbering is consistent within the whole system? Or does each process in general receive its own permutation of GPUs?

@Oblivion, sure, the processes are on the same machine. Just there are multiple GPUs. It's an Amazon's `p3.16xlarge` instance. — Serge Rogatch, Aug 25 '19 at 10:08
Related: [How does CUDA assign device IDs to GPUs?](https://stackoverflow.com/questions/13781738/how-does-cuda-assign-device-ids-to-gpus) — BlameTheBits, Aug 25 '19 at 12:18
@RobertCrovella do you mind if I edit your comment to my answer or you give an answer and I delete mine? — Oblivion, Aug 26 '19 at 16:44

Oblivion · Answer 1 · 2019-08-26T17:09:54.613

2

Edit

The numbering is consistent. I quote from @Robert Crovella

The ordering is consistent across processes, and consistent from run to run. This statement is true whether you select the default CUDA numbering, or the PCI based ordering. The run to run statement is true as long as you don't switch CUDA versions, update the system BIOS, change operating systems, change the hardware configuration of the system (e.g. add/remove devices), or change from default to PCI ordering. It also assumes you make no changes to the CUDA_VISIBLE_DEVICES environment variable.

Device Enumeration and Properties, has a variable named CUDA_DEVICE_ORDER with two possible values, FASTEST_FIRST and PCI_BUS_ID.

The documentation says, FASTEST_FIRST causes CUDA to guess which device is fastest using a simple heuristic, and make that device 0, leaving the order of the rest of the devices unspecified. PCI_BUS_ID orders devices by PCI bus ID in ascending order.

By default, this environment variable is set to FASTEST_FIRST. Therefore, it could potentially generate different IDs for the devices compared to PCI_BUS_ID if you devices happen to have different speeds.

You can set CUDA_DEVICE_ORDER via:

export CUDA_DEVICE_ORDER=PCI_BUS_ID

And this ID will be unique.

Or in host code you find the deviceId:

int dev = 0;
cudaError_t errCode = cudaDeviceGetByPCIBusId(&dev, "somebusId");
cudaSetDevice(dev);

edited Aug 26 '19 at 17:09

answered Aug 25 '19 at 11:04

Oblivion

7,176
2
14
33

@Shadow I mean if I ask for a device ID 0 I will always get the same device installed on the same slot. I'm not sure if that Id changes from run to run if the Id was generated based on fastest device. I had problems in the past – Oblivion Aug 25 '19 at 12:26
@Shadow to me it has happened that the oldest device was chosen as device 0. I wouldn't rely on FASTEST_FIRST myself. Still I'm not sure though untill I find some documentation. – Oblivion Aug 25 '19 at 17:01
1

@Shadow thanks to Robert we have a clarification on the issue. You may check the edit. – Oblivion Aug 26 '19 at 17:34
If a reproducible scenario is identified that demonstrates inconsistent device ordering from run to run as discussed, it may be a bug in CUDA. – Robert Crovella Aug 26 '19 at 17:35
@RobertCrovella I believe my observation falls in one of the scenarios you explained. There was an update in the driver version. That perfectly explains. – Oblivion Aug 26 '19 at 17:42

Is cudaSetDevice() numbering consistent across processes?

1 Answers1