Setting max length of char n-grams for fastText

Question

I want to compare word2vec and fasttext model based on this comparison tutorial. https://github.com/jayantj/gensim/blob/fast_text_notebook/docs/notebooks/Word2Vec_FastText_Comparison.ipynb

According to this, the semantic accuracy of fastText model increase when we set the max length of char n-grams to zero, such that fastText starts to behave almost like to word2vec. It ignores the ngrams.

However, I can not find any formation on how to set this parameter while loading a fastText model. Any ideas on how to do this?

score 2 · Accepted Answer · answered Aug 08 '17 at 17:59

2

The parameter is set at training time – and then the model is built using that parameter, and dependent on that parameter for interpretation. So you wouldn't typically change it upon loading an already-trained model, and there's no API in gensim (or the original FastText) to change the setting on an already-trained model.

(By looking at the source and tampering with the loaded model state directly, you might be able to approximate the effect of ignoring char-ngrams that had been trained – but that'd be a novel mode, not at all like the no-ngrams-trained mode evaluated in the notebook you've linked. It might generate interesting, or awful, results – no way to tell without trying it.)

answered Aug 08 '17 at 17:59

gojomo

52,260
14
86
115

It makes sense if it's set in the training phase. So I guess the author of the notebook trained the model him/herself with this parameter set to zero. If not then it could be that there are already pre-trained models available online which ignores n-grams. – utengr Aug 09 '17 at 09:04
Yes, as shown in that notebook itself, it's doing its own training rather than loading any pre-trained models. – gojomo Aug 09 '17 at 15:24

Setting max length of char n-grams for fastText

1 Answers1