0

This error occurs when using DataParallel. but it works when using only 1 GPU. May I ask why this problem occurs and how can I solve it?


Aterminate called after throwing an instance of 'std::runtime_error'
what():  NCCL Error 1: unhandled cuda error
Aborted (core dumped)

My code is:

gpus = [0, 1, 2]
my_model.to(pytorch_device)
my_model = DataParallel(my_model, device_ids=gpus, output_device=gpus[0])
talonmies
  • 70,661
  • 34
  • 192
  • 269
CHF
  • 9
  • 1

0 Answers0