Gensim intends to match the Facebook implementation, but with a few known or intentional differences. Specifically, Gensim doesn't implement:
- the
-supervised
option, & specific-to-that-mode autotuning/quantization/pretrained-vectors options
- word-multigrams (as controlled by the
-wordNgrams
paramerter to fasttext
)
- the plain
softmax
option for loss-optimization
Regarding options to -loss
, I'm relatively sure that despite Facebook's command-line options docs indicating that the fasttext
default is softmax
, it is actually ns
except when in -supervised
mode, just like word2vec.c
& Gensim. See for example this source code.
I suspect a future contribution to Gensim that adds wordNgrams
support would be welcome, if that mode is useful to some users, and to match the reference implementation.
So far the choice of Gensim has been to avoid any supervised algorithms, so the -supervised
mode is less-likely to appear in any future Gensim. (I'd argue for it, though, if a working implementation was contributed.)
The plain softmax
mode is so much slower on typical large output vocabularies that few non-academic projects would want to use it over hs
or ns
. (It may still be practical with a smaller-number of output-labels, as in -supervised
mode, though.)