I have created a doc2vec model of size of 100 dimensions. From what I understand from my reading that these dimensions are features of my model. How can I identify what these dimensions are exactly.
1 Answers
The 'Paragraph Vectors' algorithms behind Doc2Vec
simply gives documents vectors that are interesting in their distance/directional arrangement in comparison to other co-trained document vectors.
The individual dimensions don't have specific interpretable meanings. As with Word2Vec
, there may be 'neighborhoods' of related items, and certain directions
may vaguely map to understandable concepts.
But those directions aren't directly aligned with the individual perpendicular dimensions of the coordinate space. And there's nothing in the process that helps you describe those directional tendencies. (They tend to come up if differencing vectors, as in the analogy-solving problems.)
You can see an example in the 'Document Embedding With Paragraph Vectors' paper, Table 2, where Japanese pop artists who are (perhaps) similar to 'Lady Gaga' are discovered by shifting in space in the directions of -'American'+'Japanese'
. That is, there's no one dimension that Japanese-vs-American – but there is a directional tendency across all dimensions.

- 52,260
- 14
- 86
- 115