0

If I pass a Sentence containing 5 words to the Doc2Vec model and if the size is 100, there are 100 vectors. I'm not getting what are those vectors. If I increase the size to 200, there are 200 vectors for just a simple sentence. Please tell me how are those vectors calculated.

Yash Ghorpade
  • 607
  • 1
  • 7
  • 16

1 Answers1

1

When using a size=100, there are not "100 vectors" per text example – there is one vector, which includes 100 scalar dimensions (each a floating-point value, like 0.513 or -1.301).

Note that the values represent points in 100-dimensional space, and the individual dimensions/axes don't have easily-interpretable meanings. Rather, it is only the relative distances and relative directions between individual vectors that have useful meaning for text-based applications, such as assisting in information-retrieval or automatic classification.

The method for computing the vectors is described in the paper 'Distributed Representation of Sentences and Documents' by Le & Mikolov. But, it is closely associated to the 'word2vec' algorithm, so understanding that 1st may help, such as via its first and second papers. If that style of paper isn't your style, queries like [word2vec tutorial] or [how does word2vec work] or [doc2vec intro] should find more casual beginning descriptions.

gojomo
  • 52,260
  • 14
  • 86
  • 115
  • Okay, one thing is clear that it is just 1 Vector and not 100 vectors. I'll go through tutorials too to see how are those scalar dimensions calculated. – Yash Ghorpade Apr 19 '18 at 04:35
  • The short version is: they start as random, low-magnitude values – but then iterative training of the neural network nudges them to be better and better at the training predictions of nearby words. So there's no strong meaning to the final values, other than: "these worked pretty good for prediction, balanced against all the other word-vectors trying to be good at prediction as well". It turns out that the end result of that process arranges vectors in useful relative distances/directions. – gojomo Apr 20 '18 at 05:05