0

I want to compare word2vec and fasttext model based on this comparison tutorial. https://github.com/jayantj/gensim/blob/fast_text_notebook/docs/notebooks/Word2Vec_FastText_Comparison.ipynb

According to this, the semantic accuracy of fastText model increase when we set the max length of char n-grams to zero, such that fastText starts to behave almost like to word2vec. It ignores the ngrams.

However, I can not find any formation on how to set this parameter while loading a fastText model. Any ideas on how to do this?

utengr
  • 3,225
  • 3
  • 29
  • 68

1 Answers1

2

The parameter is set at training time – and then the model is built using that parameter, and dependent on that parameter for interpretation. So you wouldn't typically change it upon loading an already-trained model, and there's no API in gensim (or the original FastText) to change the setting on an already-trained model.

(By looking at the source and tampering with the loaded model state directly, you might be able to approximate the effect of ignoring char-ngrams that had been trained – but that'd be a novel mode, not at all like the no-ngrams-trained mode evaluated in the notebook you've linked. It might generate interesting, or awful, results – no way to tell without trying it.)

gojomo
  • 52,260
  • 14
  • 86
  • 115
  • It makes sense if it's set in the training phase. So I guess the author of the notebook trained the model him/herself with this parameter set to zero. If not then it could be that there are already pre-trained models available online which ignores n-grams. – utengr Aug 09 '17 at 09:04
  • Yes, as shown in that notebook itself, it's doing its own training rather than loading any pre-trained models. – gojomo Aug 09 '17 at 15:24