import torch
import os
torch.distributed.init_process_group(backend="nccl")
local_rank = int(os.environ["LOCAL_RANK"])
if local_rank >0:
torch.distributed.barrier()
print(f"Entered process {local_rank}")
if local_rank ==0:
torch.distributed.barrier()
The above code gets hanged forever but if I remove both torch.distributed.barrier() then both print statements get executed. Am I missing something here?
On the command line I execute the process using torchrun --nnodes=1 --nproc_per_node 2 test.py
where test.py is the name of the script
tried the above code with and without the torch.distributed.barrier() With the barrier() statements expecting the statement to print for one gpu and exit -- not as expected Without the barrier() statements expecting both to print -- as expected
Am I missing something here?