I want to train my model with 2 GPU(id 5, 6), so I run my code with CUDA_VISIBLE_DEVICES=5,6 train.py
. However, when I printed torch.cuda.current_device I still got the id 0
rather than 5,6. But torch.cuda.device_count is 2
, which semms right. How can I use GPU5,6 correctly?
Asked
Active
Viewed 974 times
0

Shengyu Liu
- 9
- 1
2 Answers
1
It is most likely correct. PyTorch only sees two GPUs (therefore indexed 0 and 1) which are actually your GPU 5 and 6.
Check the actual usage with nvidia-smi
. If it is still inconsistent, you might need to set an environment variable:
export CUDA_DEVICE_ORDER=PCI_BUS_ID
(See Inconsistency of IDs between 'nvidia-smi -L' and cuDeviceGetName())

hkchengrex
- 4,361
- 23
- 33
0
you can check the device name to verify whether that is the correct name of that GPU. However, I think when you set the Cuda_Visible outside, you have forced torch to look only at that 2 gpu. So torch will manually set index for them as 0 and 1. Because of this, when you check the current_device, it will output 0

dtlam26
- 1,410
- 11
- 19
-
Can I set both of them to be the current device? That is because I want to use both of them. – Shengyu Liu Sep 19 '20 at 13:49
-
yes, of course. You can use this example as source https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html. – dtlam26 Sep 19 '20 at 14:05
-
because, multiple gpu only use for parallel processing. If you dont declare your model for multiple gpu use, it will automatically set for a single GPU with index 0 – dtlam26 Sep 19 '20 at 14:07