I am using gensim to load the fasttext's pre-trained word embedding
de_model = KeyedVectors.load_word2vec_format('wiki.de\wiki.de.vec')
But this gives me a memory error.
Is there any way I can load it?
I am using gensim to load the fasttext's pre-trained word embedding
de_model = KeyedVectors.load_word2vec_format('wiki.de\wiki.de.vec')
But this gives me a memory error.
Is there any way I can load it?
Other than working on a machine with more memory, the gensim
load_word2vec_format()
methods have a limit
option which can be given a count n of vectors to read. Only the first n vectors of the file will be loaded.
For example, to load just the 1st 100,000 words:
de_model = KeyedVectors.load_word2vec_format('wiki.de\wiki.de.vec', limit=100000)
Since such files usually sort the more-frequent words first, and the 'long tail' of rarer words tend to be weaker vectors, many applications don't lose too much power by discarding rarer words.