2

I would like to finetune facebook/mbart-large-cc25 on my data using pre-training tasks, in particular Masked Language Modeling (MLM).

How can I do that in HuggingFace?

Edit: rewrote the question for the sake of clarity

albero
  • 169
  • 2
  • 9
  • I think for the most part you can simply follow the existing Q&A scripts (e.g., [these ones](https://github.com/huggingface/transformers/tree/master/examples/pytorch/question-answering)) and substitute in MBart. If you do need specific help, please make sure your post only includes *a single question*, to ensure answers are consistent. – dennlinger Sep 23 '21 at 09:44
  • 1
    I rewrote the question for clarity. – albero Sep 23 '21 at 09:50

1 Answers1

0

Since you are doing everything in HuggingFace, fine-tuning a model on pre-training tasks (assuming that pre-training task is provided in Huggingface) is pretty much the same for most models. What tasks are you interested in fine-tuning mBART on?

Hugginface provides extensive documentation for several fine-tuning tasks. For instance the links provided below will help you fine tune HF models for Language modelling, MNLI, SQuAD etc. https://huggingface.co/transformers/v2.0.0/examples.html and https://huggingface.co/transformers/training.html

Zain Sarwar
  • 1,226
  • 8
  • 10
  • Please note that the link to the examples is *extremely* outdated. Please refer to this one instead (latest version): https://huggingface.co/transformers/master/examples.html – dennlinger Sep 23 '21 at 09:53
  • I would like to do MLM using HF, but I didn't find any tutorial – albero Sep 23 '21 at 09:54
  • 1
    This link should help : https://github.com/huggingface/transformers/tree/master/examples/pytorch/language-modeling The script run_mlm.py can be used for finetuning your HF model. – Zain Sarwar Sep 23 '21 at 10:07
  • mBART is not supported according to the comment at line 19 https://github.com/huggingface/transformers/blob/62832c962f85b5a554ebf8b930d13b76b9028a8d/examples/pytorch/language-modeling/run_mlm.py#L19 – albero Sep 23 '21 at 14:03