Is it possible to fine tune FastText models

Question

I'm working on a project for text similarity using FastText, the basic example I have found to train a model is:

from gensim.models import FastText

model = FastText(tokens, size=100, window=3, min_count=1, iter=10, sorted_vocab=1)

As I understand it, since I'm specifying the vector and ngram size, the model is been trained from scratch here and if the dataset is small I would spect great resutls.

The other option I have found is to load the original Wikipedia model which is a huge file:

from gensim.models.wrappers import FastText

model = FastText.load_fasttext_format('wiki.simple')

My question is, can I load the Wikipedia or any other model, and fine tune it with my dataset?

score 4 · Answer 1 · answered Sep 10 '19 at 03:30

4

If you have a labelled dataset, then you should be able to fine-tune to it. This GitHub issue explains that you want to use the pretrainedVectors option. You would start with the Wikipedia pretrained vectors, then train on your dataset. It seems that gensim can do this, but according to this GH issue, there has been some bugs.

answered Sep 10 '19 at 03:30

Sam H.

4,091
3
26
34

I'm looking to finetune fasttext embeddings(unsupervised) on domain corpus, how can I achieve it? – Hari Prasad Jan 30 '20 at 14:56
@HariPrasad look at the first link I posted. FasText doesn’t support this. – Sam H. Jan 30 '20 at 15:32

Is it possible to fine tune FastText models

1 Answers1