"Snapping" the Results of t-sne to a Regular Grid - Scalability Issues

Question

I'm trying to use t-sne to arrange images based on their visual similarity, similar to this cool example for emojies (source):

but the output of t-sne is just a "point cloud", while my goal is to display the images in a regular, near-square, dense grid. So I need to somehow convert the output of t-sne to (x,y) locations on a grid.

So far, I've followed the suggestion in this great blog post: I formulated it as a linear assignment problem to find the best embedding into a regular grid. I'm pleased with the results, for example:

My Problem is that the "snapping to grid" stage turns out to be a huge bottleneck, and I need my method to scale well for a large number of images (10K). To solve the linear assignment problem I'm using a Java implementation of the Jonker-Volgenant algorithm, whose time complexity is O(n^3). So while t-sne is nlogn and can scale well up to 10K images, the part of aligning to a regular grid can only deal with up to 2K images.

Potential Solutions, as I see it:

Randomly sample 2K images out of the total 10K
Divide the 10K images into 5 and create 5 maps. This is problematic because there's a "chicken and egg" problem, how do I do the division well?
Trade accuracy for performance: Solve the Linear assignment problem approximately in a near-linear time. I want to try this but I couldn't find any existing implementations for me to use.
Implement the "snap to grid" part in a different, more efficient way.

I'm working with Java but solutions in cpp are also good. I'm guessing I'm not the first to try this. Any suggestions? Thoughts?

Thanks!

How long does it currently take to complete? – Yay295 May 25 '17 at 00:04 — Yay295, May 25 '17 at 00:04
Did you make any progress on this? – emh Aug 26 '20 at 06:54 — emh, Aug 26 '20 at 06:54

"Snapping" the Results of t-sne to a Regular Grid - Scalability Issues

0 Answers0