5

I'm trying to use t-sne to arrange images based on their visual similarity, similar to this cool example for emojies (source):

enter image description here

but the output of t-sne is just a "point cloud", while my goal is to display the images in a regular, near-square, dense grid. So I need to somehow convert the output of t-sne to (x,y) locations on a grid.

So far, I've followed the suggestion in this great blog post: I formulated it as a linear assignment problem to find the best embedding into a regular grid. I'm pleased with the results, for example:

enter image description here

My Problem is that the "snapping to grid" stage turns out to be a huge bottleneck, and I need my method to scale well for a large number of images (10K). To solve the linear assignment problem I'm using a Java implementation of the Jonker-Volgenant algorithm, whose time complexity is O(n^3). So while t-sne is nlogn and can scale well up to 10K images, the part of aligning to a regular grid can only deal with up to 2K images.

Potential Solutions, as I see it:

  1. Randomly sample 2K images out of the total 10K
  2. Divide the 10K images into 5 and create 5 maps. This is problematic because there's a "chicken and egg" problem, how do I do the division well?
  3. Trade accuracy for performance: Solve the Linear assignment problem approximately in a near-linear time. I want to try this but I couldn't find any existing implementations for me to use.
  4. Implement the "snap to grid" part in a different, more efficient way.

I'm working with Java but solutions in cpp are also good. I'm guessing I'm not the first to try this. Any suggestions? Thoughts?

Thanks!

galoosh33
  • 326
  • 3
  • 14

0 Answers0