Pytorch Multi-GPU Issue

Question

I want to train my model with 2 GPU(id 5, 6), so I run my code with CUDA_VISIBLE_DEVICES=5,6 train.py. However, when I printed torch.cuda.current_device I still got the id 0 rather than 5,6. But torch.cuda.device_count is 2, which semms right. How can I use GPU5,6 correctly?

score 1 · Answer 1 · answered Sep 19 '20 at 10:19

It is most likely correct. PyTorch only sees two GPUs (therefore indexed 0 and 1) which are actually your GPU 5 and 6.

Check the actual usage with nvidia-smi. If it is still inconsistent, you might need to set an environment variable:

export CUDA_DEVICE_ORDER=PCI_BUS_ID

(See Inconsistency of IDs between 'nvidia-smi -L' and cuDeviceGetName())

score 0 · Answer 2 · answered Sep 19 '20 at 13:08

0

you can check the device name to verify whether that is the correct name of that GPU. However, I think when you set the Cuda_Visible outside, you have forced torch to look only at that 2 gpu. So torch will manually set index for them as 0 and 1. Because of this, when you check the current_device, it will output 0

answered Sep 19 '20 at 13:08

dtlam26

1,410
11
19

Can I set both of them to be the current device? That is because I want to use both of them. – Shengyu Liu Sep 19 '20 at 13:49
yes, of course. You can use this example as source https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html. – dtlam26 Sep 19 '20 at 14:05
because, multiple gpu only use for parallel processing. If you dont declare your model for multiple gpu use, it will automatically set for a single GPU with index 0 – dtlam26 Sep 19 '20 at 14:07

Pytorch Multi-GPU Issue

2 Answers2