15

I'm running this command into a shell and get:

C:\Users\me>nvidia-smi -L    
GPU 0: Quadro K2000 (UUID: GPU-b1ac50d1-019c-58e1-3598-4877fddd3f17)    
GPU 1: Quadro 2000 (UUID: GPU-1f22a253-c329-dfb7-0db4-e005efb6a4c7)

But in my code, when I run cuDeviceGetName(.., ID) where ID is the ID given by the nvidia-smi output, the devices have been inverted: GPU 0 becomes Quadro 2000 and GPU 1 becomes Quadro K2000.

Is this an expected behavior or a bug ? Does anyone know a workaround to make nvidia-smi get the 'real' ID of GPUs ? I could use the UUID to get the proper device with nvmlDeviceGetUUID() but using nvml API seems a bit too complicated for what I'm trying to achieve.

This question discuss how CUDA assign IDs to devices without clear conclusion.

I am using CUDA 6.5.

EDIT: I've had a look at nvidia-smi manpage (should have done that earlier...). It states:

"It is recommended that users desiring consistencyuse either UUDI or PCI bus ID, since device enumeration ordering is not guaranteed to be consistent"

Still looking for a kludge...

Community
  • 1
  • 1
GaTTaCa
  • 459
  • 6
  • 18

3 Answers3

19

You can set the device order for CUDA environment in your shell to follow the bus ID instead of the default of fastest card. Requires CUDA 7 and up.

export CUDA_DEVICE_ORDER=PCI_BUS_ID
Andrew K
  • 1,571
  • 1
  • 17
  • 25
Teshy
  • 366
  • 3
  • 8
16

It's expected behavior.

nvidia-smi enumerates in PCI order.

By default, the CUDA driver and runtime APIs do not.

The question you linked clearly shows how to associate the two numbering/ordering schemes.

There is no way to cause nvidia-smi to modify its ordering scheme to match whatever will be generated by the CUDA runtime or driver APIs. However you can modify the CUDA runtime enumeration order through the use of an environment variable in CUDA 8.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • Even tho the documentation also states that it sorts by pciBusId, I doubt that it is the only criteria since on my machine 2x Tesla K80 are on the same pciBusId. I wonder what is not the correct order for those two babies. – Marc J. Schmidt Nov 08 '17 at 12:44
  • "2x Tesla K80 are on the same pciBusid" not possible. Take a look closely at your deviceQuery output – Robert Crovella Nov 08 '17 at 16:10
  • Well, there can be several cards on the same pciBusId (reported by http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g1bf9d625a931d657e08db2b4391170f0), only differentiable by pciDeviceID. Tensorflow for example prints: First card: pciBusID: 0000:00:04.0 second card: pciBusID: 0000:00:05.0 However, both have the same bus id. The "0000:00:05.0" is built by "[domain]:[bus]:[device].[function]" (see http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1gea264dad3d8c4898e0b82213c0253def) – Marc J. Schmidt Nov 08 '17 at 16:34
  • the PCI_BUS_ID token being used here refers to the full BDF format of PCI device numeration. In that sense, two separate GPU devices cannot have the same full BDF, and the token as used in the environment variable will order devices consistently by a sorted BDF order. – Robert Crovella Nov 08 '17 at 16:40
  • I see, thanks for the explanation. Already thought that "pciBusId" alone is a bit misleading when there're several formats. So, good to know it's the Bus:Device.Function (BDF) notation which is being used to order the devices. – Marc J. Schmidt Nov 09 '17 at 16:31
3

It's the expected behaviour.

nvidia-smi manpage says that

the GPU/Unit's 0-based index in the natural enumeration returned by the driver,

CUDA API enumerates in descending order of compute capability according to "Programming Guide" 3.2.6.1 Device enumeration.

I had this problem and I have written a program is analog of nvidia-smi, but with enumerated devices in an order consistent with CUDA API. Farther in the text ref on the program

https://github.com/smilart/nvidia-cdl

I have written the program because nvidia-smi cannot enumerated device in an order consistent with CUDA API.

AlexanderKomarov
  • 400
  • 5
  • 11