2

I am trying to train my model with data which is 50 mb in size . I was just wondering if there is a rule/algorithm for determining the size of the dimension for the algorithm.

gaurus
  • 426
  • 1
  • 4
  • 16
  • Possible duplicate of [Word2Vec: Number of Dimensions](https://stackoverflow.com/questions/26569299/word2vec-number-of-dimensions) – Abu Shoeb Sep 01 '18 at 04:31

1 Answers1

1

I would assume a 50mb text file as about 500,000 sentences or 5 million tokens. It's way too small to train a meaningful embedding however here is the empirical data (trained on 6Billion tokens) that you could refer to.

enter image description here

Source: https://nlp.stanford.edu/pubs/glove.pdf

aerin
  • 20,607
  • 28
  • 102
  • 140