I am training a detectron2 model on google cloud platform and want to run thin model on 4 gpus.
to launch the training i am using:
if __name__ == "__main__":
launch(main, num_gpus_per_machine=4)
but when i run this model training i get an error: "ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable MASTER_ADDR expected, but not set"
when i was running this training on num_gpus_per_machine=1, it was running fine. what does this mean and how can i solve it?