1

For a simple evaluation on my Doc2Vec training model, I need to transform 400-dimension vectors to 2-dimensions and visualize the documents as a set of nodes, where the distance between any two nodes is inversely proportional to their similarity (nodes that are highly similar are close together).

After some searching, I found MDS (multidimensional scaling) and sklearn MDS library for it.

Now I have 2.2M vectors that each of them has 400 dimensions and I don't know how can pass them to sklearn MDS function in correct syntax with the lowest cost. I know create similarity matrix between 2.2M vectors is impossible.

Mahmood Kohansal
  • 1,021
  • 2
  • 16
  • 42

1 Answers1

0

For a rather similar task I found that reducing the dimensionality of Doc2Vec (from default 100 to 30 in our case) was absolutely crucial for any sort of spacial reconstruction while working on a Macbook Pro even for a relatively small dataset.

This was a good starting point (albeit with tSNE reduction and outdated interfaces).

ellimilial
  • 1,246
  • 11
  • 20