0

I wanted to pre-train BERT with the data from my own language since multilingual (which includes my language) model of BERT is not successful. Since whole pre-training costs a lot, I decided to fine tune it on its own 2 tasks: masked language model and next sentence prediction. There are previous implementation on different tasks (NER, sentiment analysis etc.), but I couldn't find any fine tuning on its own tasks. Is there an implementation that I couldn't see? If not, where should I start? I need some initial help.

ozler.kb
  • 13
  • 2
  • 5

1 Answers1

2

A wonderful resource for BERT is: https://github.com/huggingface/pytorch-pretrained-BERT. This repository contains op-for-op PyTorch reimplementations, pre-trained models and fine-tuning examples for Google's BERT model.

You can find the language model fine-tuning examples in the following link. The three example scripts in this folder can be used to fine-tune a pre-trained BERT model using the pretraining objective (the combination of masked language modeling and next sentence prediction loss).

By the way, BERT multilingual is available for 104 languages (ref), and it is found to be surprisingly effective in many cross-lingual NLP tasks (ref). So, make sure you use BERT appropriately in your task.

Wasi Ahmad
  • 35,739
  • 32
  • 114
  • 161