19

When I try to create a word2vec model (skipgram with negative sampling) I received 3 files as output as follows.

word2vec (File)
word2vec.syn1nef.npy (NPY file)
word2vec.wv.syn0.npy (NPY file)

I am just worried why this happens as for my previous test examples in word2vec I only received one model(no npy files).

Please help me.

gojomo
  • 52,260
  • 14
  • 86
  • 115

1 Answers1

33

Models with larger internal vector-arrays can't be saved via Python 'pickle' to a single file, so beyond a certain threshold, the gensim save() method will store subsidiary arrays in separate files, using the more-efficient raw format of numpy arrays (.npy format).

You still load() the model by just specifying the root model filename; when the subsidiary arrays are needed, the loading code will find the side files – as long as they're kept beside the root file. So when moving a model elsewhere, be sure to keep all files with the same root filename together.

gojomo
  • 52,260
  • 14
  • 86
  • 115
  • Thanks a lot for the great answer :) –  Nov 10 '17 at 03:22
  • Can you point me towards what exactly are stored individually in the files? – Saurav Mukherjee Jan 29 '18 at 00:09
  • 1
    You should consult the source code for that information: https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/word2vec.py & https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/keyedvectors.py will show what `syn1neg`, `syn0`, and other other properties that might be stored as separate files are. – gojomo Jan 30 '18 at 02:19