loading of fasttext pre trained german word embedding's .vec file throwing out of memory error

Question

I am using gensim to load the fasttext's pre-trained word embedding

de_model = KeyedVectors.load_word2vec_format('wiki.de\wiki.de.vec')

But this gives me a memory error.

Is there any way I can load it?

score 6 · Accepted Answer · answered Jun 18 '18 at 22:20

Other than working on a machine with more memory, the gensim load_word2vec_format() methods have a limit option which can be given a count n of vectors to read. Only the first n vectors of the file will be loaded.

For example, to load just the 1st 100,000 words:

de_model = KeyedVectors.load_word2vec_format('wiki.de\wiki.de.vec', limit=100000)

Since such files usually sort the more-frequent words first, and the 'long tail' of rarer words tend to be weaker vectors, many applications don't lose too much power by discarding rarer words.

loading of fasttext pre trained german word embedding's .vec file throwing out of memory error

1 Answers1