0

I want to train my model with 2 GPU(id 5, 6), so I run my code with CUDA_VISIBLE_DEVICES=5,6 train.py. However, when I printed torch.cuda.current_device I still got the id 0 rather than 5,6. But torch.cuda.device_count is 2, which semms right. How can I use GPU5,6 correctly?

2 Answers2

1

It is most likely correct. PyTorch only sees two GPUs (therefore indexed 0 and 1) which are actually your GPU 5 and 6.

Check the actual usage with nvidia-smi. If it is still inconsistent, you might need to set an environment variable:

export CUDA_DEVICE_ORDER=PCI_BUS_ID

(See Inconsistency of IDs between 'nvidia-smi -L' and cuDeviceGetName())

hkchengrex
  • 4,361
  • 23
  • 33
0

you can check the device name to verify whether that is the correct name of that GPU. However, I think when you set the Cuda_Visible outside, you have forced torch to look only at that 2 gpu. So torch will manually set index for them as 0 and 1. Because of this, when you check the current_device, it will output 0

dtlam26
  • 1,410
  • 11
  • 19
  • Can I set both of them to be the current device? That is because I want to use both of them. – Shengyu Liu Sep 19 '20 at 13:49
  • yes, of course. You can use this example as source https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html. – dtlam26 Sep 19 '20 at 14:05
  • because, multiple gpu only use for parallel processing. If you dont declare your model for multiple gpu use, it will automatically set for a single GPU with index 0 – dtlam26 Sep 19 '20 at 14:07