What is the real reason for speed-up, even though the pipeline mentioned in the fasttext paper uses techniques - negative sampling and heirerchichal softmax; in earlier word2vec papers. I am not able to clearly understand the actual difference, which is making this speed up happen ?
2 Answers
Is there that much of a speed-up?
I don't think there are any algorithmic breakthroughs which make the word2vec-equivalent word-vector training in FastText significantly faster. (And if you're using the character-ngrams option in FastText, to allow post-training synthesis of vectors for unseen words based on substrings shared with training-words, I'd expect the training to be slower, because every word requires training of its substring vectors as well.)
Any speedups in FastText are likely just because the code is well-tuned, with the benefit of more implementation experience.

- 52,260
- 14
- 86
- 115
-
gojomo user2376672 : by any chance di any of you found an answer ? we have exactly same question : https://groups.google.com/forum/#!searchin/fasttext-library/satyam%7Csort:relevance/fasttext-library/hY6lNjmKE1A/kcY3nE-TAwAJ – Anuj Gupta Jun 20 '17 at 14:11
-
I believe my answer above is correct: there's only a marginal speedup, and it's only because of an incrementally-better-optimized implementation or libraries (no algorithmic breakthroughs). If you believe the above is an insufficient explanation, my question for you is the same as for @user2376672 – is there that much of a speed-up? (What magnitude of speed-up are you trying to explain?) – gojomo Jun 20 '17 at 20:08
To be efficient on datasets with a very large number of categories, Fast text uses a hierarchical classifier instead of a flat structure, in which the different categories are organized in a tree (think binary tree instead of list). This reduces the time complexities of training and testing text classifiers from linear to logarithmic with respect to the number of classes. FastText also exploits the fact that classes are imbalanced (some classes appearing more often than other) by using the Huffman algorithm to build the tree used to represent categories. The depth in the tree of very frequent categories is, therefore, smaller than for infrequent ones, leading to further computational efficiency.
Reference link: https://research.fb.com/blog/2016/08/fasttext/

- 91
- 1
- 6