Accelerate BERT training with HuggingFace Model Parallelism

Question

I am currently using SageMaker to train BERT and trying to improve the BERT training time. I use PyTorch and Huggingface on AWS g4dn.12xlarge instance type.

However when I run parallel training it is far from achieving linear improvement. I'm looking for some hints on distributed training to improve the BERT training time in SageMaker.

score 0 · Answer 1 · answered Nov 08 '22 at 01:12

You can use SageMaker Distributed Data Parallel (SMDDP) to run training on a multinode and multigpu setup. Please refer to the below links for BERT based training example

https://github.com/aws/amazon-sagemaker-examples/blob/main/training/distributed_training/pytorch/data_parallel/bert/pytorch_smdataparallel_bert_demo.ipynb

This is with HuggingFace - https://github.com/aruncs2005/pytorch-ddp-sm-example

please refer to the documentation here for step by step instructions.

https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-modify-sdp-pt.html

Accelerate BERT training with HuggingFace Model Parallelism

1 Answers1