2

I am new to fastText, a library for efficient learning of word representations and sentence classification. I am trying to generate word-vector for huge data set. But in single process it's taking significantly long time.

So let me put my questions clearly:

  • Are there any options which I can use to speedup the single fastText process?
  • Is there any way to generate word-vector in parallel fastText processes?
  • Are there any other implementation or workaround available which can solve the problem, as I read caffe2 implementation is available, but I am unable to find it.

Thanks

U880D
  • 8,601
  • 6
  • 24
  • 40
Shukla
  • 91
  • 6
  • facebook production runs on caffe and hence what they are talking about is probably the fasttext code that is running in their production server and about any opensource version. – joydeep bhattacharjee May 05 '18 at 12:44

2 Answers2

2

I understand your questions that you like to distribute fastText and do parallel training.

As mentioned in Issue #144

... a future feature we might consider implementing. For now it's not on our list of priorities, but it might very well soon.

Except for the there also mentioned Word2Vec Spark implementation, I am not aware of any other implementations.

U880D
  • 8,601
  • 6
  • 24
  • 40
2

The original FastText release by Facebook includes a command-line option thread, default 12, which controls the number of worker threads which will do parallel training (on a single machine). If you have more CPU cores, and haven't yet tried increasing it, try that.

The gensim implementation (as gensim.models.fasttext.FastText) includes an initialization parameter, workers, which controls the number of worker threads. If you haven't yet tried increasing it, up to the number of cores, it may help. However, due to extra multithreading bottlenecks in its Python implementation, if you have a lot of cores (especially 16+), you might find maximum throughput with fewer workers than cores – often something in the 4-12 range. (You have to experiment & watch the achieved rates via logging to find the optimal value, and all cores won't be maxed.)

You'll only get significant multithreading in gensim if your installation is able to make use of its Cython-optimized routines. If you watch the logging when you install gensim via pip or similar, there should be a clear error if this fails. Or, if you are watching logs/output when loading/using gensim classes, there will usually be a warning if the slower non-optimized versions are being used.

Finally, often in the ways people use gensim, the bottleneck can be in their corpus iterator or IO rather than the parallelism. To minimize this slowdown:

  • Check to see how fast your corpus can iterate over all examples separate from passing it to the gensim class.
  • Avoid doing any database-selects or complicated/regex preprocessing/tokenization in the iterator – do it once, and save the easy-to-read-as-tokens resulting corpus somewhere.
  • If the corpus is coming from a network volume, test if streaming it from a local volume helps. If coming from a spinning HD, try an SSD.
  • If the corpus can be made to fit in RAM, perhaps on a special-purpose giant-RAM machine, try that.
gojomo
  • 52,260
  • 14
  • 86
  • 115