1

I'm using huggingface transformer gpt-xl model to generate multiple responses. I'm trying to run it on multiple gpus because gpu memory maxes out with multiple larger responses. I've tried using dataparallel to do this but, looking at nvidia-smi it does not appear that the 2nd gpu is ever used. Here's my code:

import numpy as np
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch
n_gpu=torch.cuda.device_count()
#device = xm.xla_device()
device=torch.device("cuda:0")
tokenizer = GPT2Tokenizer.from_pretrained('/spell/GPT2Model/GPT2Model/') #downloaded pre-trained model and tokenizer earlier
model = GPT2LMHeadModel.from_pretrained('/spell/GPT2Model/GPT2Model/')
model.to(device)
model = torch.nn.DataParallel(model, device_ids=[0,1])
encoded_prompt=tokenizer.encode(prompt_text, add_special_tokens=True,return_tensors="pt")
encoded_prompt = encoded_prompt.to(device)
outputs = model.module.generate(encoded_prompt,response_length,temperature=.8,num_return_sequences=num_of_responses,repetition_penalty=85,do_sample=True,top_k=80,top_p=.85 )

program gets oom on dual T4, memory of 2nd gpu never goes above 11M.

rwreed
  • 348
  • 2
  • 10
  • 3
    `DataParallel` duplicates the model across GPUs, but one model is entirely kept on a single GPU, just the batch size is split to distribute the data across the available GPUs. If you want to split parts of the model to different GPUs, you'd need to manually put the layers and inputs/outputs to those devices. [Model parallelism in pytorch for large(r than 1 GPU) models?](https://discuss.pytorch.org/t/model-parallelism-in-pytorch-for-large-r-than-1-gpu-models/778) outlines that approach. – Michael Jungo May 12 '20 at 01:33

0 Answers0