0

I am working on a PyTorch project, and I want to disable data parallelization to ensure that each program runs on a single specified GPU, avoiding memory duplication. I have followed the standard steps of moving the model to the desired GPU device and disabling data parallelization. However, when I simultaneously launch multiple instances of the program, I observe memory duplication across multiple GPUs.

Here are the steps I have taken:

I move the model to the desired GPU device using model.to(device), where device is set to a specific GPU device (e.g., torch.device("cuda:0")).

I disable data parallelization as follows:

model = model.to(device)
model = torch.nn.DataParallel(model, device_ids=[device])

Despite these steps, the memory is still duplicated across multiple GPUs when multiple instances of the program are running simultaneously. I want to ensure that each program uses only its designated GPU without memory duplication.

The nvidia-smi result are as follows:

When I just activate single GPU:

enter image description here

When I activated two GPUs but only run one python program and the memory got almost duplicated:

enter image description here

The whole program part related to this problem are as follows:

def load_model_on_lowest_memory_gpu(model):
    available_gpus = GPUtil.getAvailable(order='memory', limit=torch.cuda.device_count())
    selected_gpu = torch.device("cuda:{}".format(available_gpus[0]))
    print(selected_gpu)
    model = model.to(selected_gpu)
    model = torch.nn.DataParallel(model, device_ids=[selected_gpu])
    return model

# in __main__:

net = AlexNet.AlexNet(8)
net.load_state_dict(torch.load(dict_path))
net = load_model_on_lowest_memory_gpu(net)

# when using the net: (images are the input images)

outputs = net(images.cuda())

Am I missing something in the configuration or is there another step I should follow to achieve this? Any help or guidance would be greatly appreciated.

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
RiverFlows
  • 31
  • 4

0 Answers0