1

The bash file I used to launch the training looks like this:

CUDA_VISIBLE_DEVICES=3,4 python -m torch.distributed.launch \
--nproc_per_node=2  train.py \
--batch_size 6 \
--other_args

I found that the batch size of tensors in each GPU is acctually batch_size / num_of_gpu = 6/2 = 3.

When I initialize my network, I need to know the batch size in each GPU. (Ps. in this phase, I can't use input_tensor.shape to get the size of batch-dimension, since there are no data fed in jet.)

Somehow I could not find where does the pytorch store the parameter --nproc_per_node. So how could I know how many GPUs are used, without passing it manually as --other_args?

Shai
  • 111,146
  • 38
  • 238
  • 371
zheyuanWang
  • 1,158
  • 2
  • 16
  • 30
  • 1
    Not sure if you can access these args in the child process, but if you always set the cuda visible devices, maybe a workaround could be `len(os.getenv('CUDA_VISIBLE_DEVICES').split(','))` or `torch.cuda.device_count()`? – Berriel Aug 16 '21 at 12:58

1 Answers1

1

I think you are looking for torch.distributed.get_world_size() - this will tell you how many processes were created.

Shai
  • 111,146
  • 38
  • 238
  • 371