I’m working on word embeddings and a little bit confused about number of word vector's dimensions. I mean, take word2vec as an example, my question is why we should use lets say 100 hidden neurons for our hidden layer? Does this number have any meaning or logic behind? or if it is arbitrary, why not 300? or 10? why not more or less? As we all know the simplest way to display vectors is on 2 dimensions space (only X and Y), why more dimensions? I read some resources about it and in one example they choose 100 dimensions, in the other they choose the other numbers like 150, 200, 80, etc.
I know the larger the number, the bigger the space for displaying relations between words, but we couldn't display relations on 2 dimensions vector space (only X and Y)?! why we need bigger space? each word is displayed by a vector so why we have to use high dimensional space when we can display vectors on 2 or 3 dimensions space? and then its more simple to use similarity techniques like cosine to find the similarities on 2 or 3 dimensions rather than 100 (from computation time viewpoint), right?