how to know how many GPUs are used in pytorch?

Question

The bash file I used to launch the training looks like this:

CUDA_VISIBLE_DEVICES=3,4 python -m torch.distributed.launch \
--nproc_per_node=2  train.py \
--batch_size 6 \
--other_args

I found that the batch size of tensors in each GPU is acctually batch_size / num_of_gpu = 6/2 = 3.

When I initialize my network, I need to know the batch size in each GPU. (Ps. in this phase, I can't use input_tensor.shape to get the size of batch-dimension, since there are no data fed in jet.)

Somehow I could not find where does the pytorch store the parameter --nproc_per_node. So how could I know how many GPUs are used, without passing it manually as --other_args?

Not sure if you can access these args in the child process, but if you always set the cuda visible devices, maybe a workaround could be `len(os.getenv('CUDA_VISIBLE_DEVICES').split(','))` or `torch.cuda.device_count()`? — Berriel, Aug 16 '21 at 12:58

score 1 · Accepted Answer · answered Aug 16 '21 at 13:14

1

I think you are looking for torch.distributed.get_world_size() - this will tell you how many processes were created.

answered Aug 16 '21 at 13:14

Shai

111,146
38
238
371

how to know how many GPUs are used in pytorch?

1 Answers1