Python - Visualize Doc2Vec multi-dimensional Vectors in 2D with sklearn MDS class

Question

For a simple evaluation on my Doc2Vec training model, I need to transform 400-dimension vectors to 2-dimensions and visualize the documents as a set of nodes, where the distance between any two nodes is inversely proportional to their similarity (nodes that are highly similar are close together).

After some searching, I found MDS (multidimensional scaling) and sklearn MDS library for it.

Now I have 2.2M vectors that each of them has 400 dimensions and I don't know how can pass them to sklearn MDS function in correct syntax with the lowest cost. I know create similarity matrix between 2.2M vectors is impossible.

score 0 · Answer 1 · answered Feb 16 '17 at 21:05

For a rather similar task I found that reducing the dimensionality of Doc2Vec (from default 100 to 30 in our case) was absolutely crucial for any sort of spacial reconstruction while working on a Macbook Pro even for a relatively small dataset.

This was a good starting point (albeit with tSNE reduction and outdated interfaces).

Python - Visualize Doc2Vec multi-dimensional Vectors in 2D with sklearn MDS class

1 Answers1