DistributedDataParallel with gpu device ID specified in PyTorch

Question

I want to train my model through DistributedDataParallel on a single machine that has 8 GPUs. But I want to train my model on four specified GPUs with device IDs 4, 5, 6, 7.

How to specify the GPU device ID for DistributedDataParallel?

I think the world size will be 4 for this case, but what should be the rank in this case?

score 2 · Accepted Answer · answered Oct 25 '21 at 13:14

2

You can set the environment variable CUDA_VISIBLE_DEVICES. Torch will read this variable and only use the GPUs specified in there. You can either do this directly in your python code like this:

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '4, 5, 6, 7'

Take care to execute this command before you initialize torch in any way, else the statement will not take effect. The other option would be to set the environment variable temporarily before starting your script in the shell:

CUDA_VISIBLE_DEVICES=4,5,6,7 python your_script.py

answered Oct 25 '21 at 13:14

besterma

46
5

Does this answer your question ? If so, please consider marking it as the accepted answer, and if not, please let me know why so it can be improved – besterma Nov 02 '21 at 18:40
could you check this question please? ty https://stackoverflow.com/questions/74380417/how-to-use-multiple-gpu-at-the-same-time-when-using-models-with-pytorch-and-hugg – Furkan Gözükara Nov 09 '22 at 20:07

DistributedDataParallel with gpu device ID specified in PyTorch

1 Answers1