4

I am using gensim to load the fasttext's pre-trained word embedding

de_model = KeyedVectors.load_word2vec_format('wiki.de\wiki.de.vec')

But this gives me a memory error.

Is there any way I can load it?

shahaf
  • 4,750
  • 2
  • 29
  • 32
shasvat desai
  • 419
  • 3
  • 11

1 Answers1

6

Other than working on a machine with more memory, the gensim load_word2vec_format() methods have a limit option which can be given a count n of vectors to read. Only the first n vectors of the file will be loaded.

For example, to load just the 1st 100,000 words:

de_model = KeyedVectors.load_word2vec_format('wiki.de\wiki.de.vec', limit=100000)

Since such files usually sort the more-frequent words first, and the 'long tail' of rarer words tend to be weaker vectors, many applications don't lose too much power by discarding rarer words.

gojomo
  • 52,260
  • 14
  • 86
  • 115