what should be the dimension of vectors for word2vec algorithm for 50 mb data

Question

I am trying to train my model with data which is 50 mb in size . I was just wondering if there is a rule/algorithm for determining the size of the dimension for the algorithm.

Possible duplicate of [Word2Vec: Number of Dimensions](https://stackoverflow.com/questions/26569299/word2vec-number-of-dimensions) — Abu Shoeb, Sep 01 '18 at 04:31

score 1 · Answer 1 · answered May 01 '17 at 22:44

I would assume a 50mb text file as about 500,000 sentences or 5 million tokens. It's way too small to train a meaningful embedding however here is the empirical data (trained on 6Billion tokens) that you could refer to.

Source: https://nlp.stanford.edu/pubs/glove.pdf

what should be the dimension of vectors for word2vec algorithm for 50 mb data

1 Answers1