3

Say I have the following model (from this script):

from transformers import AutoTokenizer, GPT2LMHeadModel, AutoConfig

config = AutoConfig.from_pretrained(
    "gpt2",
    vocab_size=len(tokenizer),
    n_ctx=context_length,
    bos_token_id=tokenizer.bos_token_id,
    eos_token_id=tokenizer.eos_token_id,
)
model = GPT2LMHeadModel(config)

I'm currently using this training arguments for the Trainer:

from transformers import Trainer, TrainingArguments

args = TrainingArguments(
    output_dir="codeparrot-ds",
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    evaluation_strategy="steps",
    eval_steps=5_000,
    logging_steps=5_000,
    gradient_accumulation_steps=8,
    num_train_epochs=1,
    weight_decay=0.1,
    warmup_steps=1_000,
    lr_scheduler_type="cosine",
    learning_rate=5e-4,
    save_steps=5_000,
    fp16=True,
    push_to_hub=True,
)

trainer = Trainer(
    model=model,
    tokenizer=tokenizer,
    args=args,
    data_collator=data_collator,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["valid"],
)
trainer.train()

How can I adapt this so the Trainer will use multiple GPUs (e.g., 8)?

I found this SO question, but they didn't use the Trainer and just used PyTorch's DataParallel

model = torch.nn.DataParallel(model, device_ids=[0,1])

The Huggingface docs on training with multiple GPUs are not really clear to me and don't have an example of using the Trainer. Instead, I found here that they add arguments to their python file with nproc_per_node, but that seems too specific to their script and not clear how to use in general. This is in contrary to this discussion on their forum that says "The Trainer class automatically handles multi-GPU training, you don’t have to do anything special.". So this is confusing as on one hand they're mentioning that there are things needed to be done to train on multiple GPUs, and also saying that the Trainer handles it automatically. So I'm not sure what to do.

Penguin
  • 1,923
  • 3
  • 21
  • 51
  • 1
    Unfortunately, no magic one argument/liner (yet). But with a little more line modification to your code you can use https://huggingface.co/docs/transformers/accelerate and https://huggingface.co/docs/transformers/perf_train_gpu_many#zero-data-parallelism – alvas Mar 22 '23 at 17:32
  • I'm trying to do this right now, it seems like there still is no way of using the plain trainer class to do this, but if someone has figured it out please answer! – sanminchui Aug 17 '23 at 20:42

1 Answers1

2

I used one of following python scripts (e.g. run_clm.py) where trainer.train() is in there: https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling

Make a finetune.sh bash file, execute a python script inside

#!/bin/bash
export LD_LIBRARY_PATH=/home/miniconda3/envs/HF/lib/python3.7/.../nvidia/cublas/lib/:$LD_LIB
export CUDA_VISIBLE_DEVICES=0,1  # will use two GPUs
###############################
python run_clm.py --options...

Then run it via bash, it'll run over the two GPUs as defined.

$ nohup ./finetune.sh & 

If you want to run over all available 8 GPUs,
simply comment the following line

#export CUDA_VISIBLE_DEVICES=0,1 # will use all GPUs
ikirk
  • 21
  • 2