0
import torch
import os
torch.distributed.init_process_group(backend="nccl")
local_rank = int(os.environ["LOCAL_RANK"])

if local_rank >0:
    torch.distributed.barrier()

print(f"Entered process {local_rank}")

if local_rank ==0:
    torch.distributed.barrier()

The above code gets hanged forever but if I remove both torch.distributed.barrier() then both print statements get executed. Am I missing something here?

On the command line I execute the process using torchrun --nnodes=1 --nproc_per_node 2 test.py where test.py is the name of the script

tried the above code with and without the torch.distributed.barrier() With the barrier() statements expecting the statement to print for one gpu and exit -- not as expected Without the barrier() statements expecting both to print -- as expected

Am I missing something here?

1 Answers1

0

It is better to put your multiprocessing initialization code inside the if __name__ == "__main__": to avoid endless process generation and re-design the control flow to fit your purpose:

if __name__ == "__main__":
    import torch
    import os
    torch.distributed.init_process_group(backend="nccl")
    local_rank = int(os.environ["LOCAL_RANK"])

    if local_rank > 0:
        torch.distributed.barrier()
    else:
        print(f"Entered process {local_rank}")
        torch.distributed.barrier()
TQCH
  • 1,162
  • 1
  • 6
  • 13