Find n nearest points with Solr in multi-dimensional space

Question

Solr experts, I'd really appreciate some advice on my problem.

I want to build a multi-dimensional space using Solr, let's say with 5 dimensions. In this space, there should be points, e.g.

P1 (0.3, 0.3, 0.3, 0.3, 0.3)
P2 (0.5, 0.5, 0.5, 0.5, 0.1)
P3 (0.5, 0.1, 0.1, 0.1, 0.1)

Now I'd like to find the point that is nearest to a given point, e.g.

Px (0.5, 0.5, 0.5, 0.5, 0.5)

I've tried to find reliable information about multi-dimensional spatial search. But I could not find anything that was of help.

In the Solr Wiki is an article about Spatial Search. But there they are only using 2 dimensions.

So my question is: Does Solr provide the functionality for a multi-dimensional spatial search?

I don't think Solr handles anything above 2 dimensions. But it can handle expressions, you could try to adapt one of the many ways to find nearest neighbour in N-Dimensional space to something solr will understand, but i don't think it will work very fast. The only other way i can see it working is dividing the 5 dimensions into https://www.wolframalpha.com/input/?i=permutations+of+5+elements+taken+by+2 and have all 20 fields as distances and find the one with the smallest sum of all of them — KinSlayerUY, Apr 19 '16 at 13:41
I was afraid that Solr does not provide the functionality. I think the suggestion to divide the 5 dimensions into permutations won't fulfill our requirements regarding performance. Nevertheless, thanks for your valuable answer. — theb, Apr 20 '16 at 15:28
This is supported in lucene. I'm still trying to figure out how to do it in solr. http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/document/DoublePoint.html — Eric Hartford, Jul 17 '17 at 04:56

score 2 · Accepted Answer · answered Nov 11 '18 at 11:29

You can use either Principal component analysis or T-distributed Stochastic Neighbor Embedding to reduce your 5-dimensional space to a 2-dimensional representation, and then you can use Solr to find the nearest neighbors for any point on your dataset.

According to this question, it seems that t-SNE is the most suitable option for your problem.

There is a Python t-SNE tutorial here but I think this would be enough to solve your problem:

from sklearn.manifold import TSNE
X = np.array([ [0.3, 0.3, 0.3, 0.3, 0.3], [0.5, 0.5, 0.5, 0.5, 0.1], [0.5, 0.1, 0.1, 0.1, 0.1], [0.5, 0.5, 0.5, 0.5, 0.5] ])
reduced_points = TSNE(n_components=2, random_state=0, angle=.99, init='pca').fit_transform(X)
reduced_points = [ [int(x[0]*100), int(x[1]*100)] for x in reduced_points ]

And then you'll get your points in bidimensional space.

>>> reduced_points
[[-21020, 2023], [-12745, -16097], [-2899, 10298], [5375, -7822]]

score 0 · Answer 2 · answered Jul 20 '17 at 13:49

0

This isn't supported in Solr, but it is supported in Lucene.

https://www.elastic.co/blog/lucene-points-6.0

answered Jul 20 '17 at 13:49

Eric Hartford

16,464
4
33
50

Find n nearest points with Solr in multi-dimensional space

2 Answers2