K nearest neighbour search with weights on dimensions

Question

I have a floor on which various sensors are placed at different location on the floor. For every transmitting device, sensors may detect its readings. It is possible to have 6-7 sensors on a floor, and it is possible that a particular reading may not be detected by some sensors, but are detected by some other sensors.

For every reading I get, I would like to identify the location of that reading on the floor. We divide floor logically into TILEs (5x5 feet area) and find what ideally the reading at each TILE should be as detected by each sensor device (based on some transmission pathloss equation).

I am using the precomputed readings from 'N' sensor device at each TILE as a point in N-dimensional space. When I get a real life reading, I find the nearest neighbours of this reading, and assign this reading to that location.

I would like to know if there is a variant of K nearest neighbours, where a dimension could be REMOVED from consideration. This will especially be useful, when a particular sensor is not reporting any reading. I understand that putting weightage on a dimension will be impossible with algorithms like kd-tree or R trees. However, I would like to know if it would be possible to discard a dimension when computing nearest neighbours. Is there any such algorithm?

EDIT:

What I want to know is if the same R/kd tree could be used for k nearest search with different queries, where each query has different dimension weightage? I don't want to construct another kd-tree for every different weightage on dimensions.

EDIT 2:

Is there any library in python, which allows you to specify the custom distance function, and search for k nearest neighbours? Essentially, I would want to use different custom distance functions for different queries.

Ok, let me see if I understand a couple things. First, if a sensor doesn't pick up a reading, it doesn't pick up an incorrect reading, it just reports zero, right? Second, just to clarify, you currently have a setup like this. Maybe the floor has 25 tiles. You have through experimentation determined that a reading at tile 1 gives a reading of 3 at sensor A, 4 at sensor B, etc. - a vector of values for each sensor for a given tile. Is that the case? If so, couldn't you simply use regular Euclidean distance excluding the dimensions you don't know, and the tile with the closest distance "wins"? — J Trana, Dec 23 '13 at 07:12
@JTrana, that is precisely what I'm trying to do. However, there will be around 10 K tiles, and over a million device events to be tracked in one day, so I'd want to computation to be fast. — Ouroboros, Dec 23 '13 at 07:32
I'd hope to use the approach that kd-trees / R trees take to search quickly in logarithmic time. — Ouroboros, Dec 23 '13 at 07:34

Has QUIT--Anony-Mousse · Answer 1 · 2013-12-25T19:24:04.457

0

Both for R-trees and kd-trees, using weighted Minkowski norms is straightforward. Just put the weights into your distance equations!

Putting weights into Eulidean point-to-rectangle minimum distance is trivial, just look at the regular formula and plug in the weight as desired.

Distances are not used at tree construction time, so you can vary the weights as desired at query time.

edited Dec 25 '13 at 19:24

answered Dec 23 '13 at 11:50

Has QUIT--Anony-Mousse

76,138
12
138
194

what I want to know is if the same R tree could be used for k nearest search with different queries, where each query has different dimension weightage? I don't want to construct another kd-tree for every different weightage on dimensions. – Ouroboros Dec 23 '13 at 13:14
Yes, it can. This is also obvious, as the distance function is not used/known at construction time. (If you have static weights, you may however want to have the splitting strategy use the same weights, to produce better splits for your query; but that is an optional enhancement and usually not applicable) – Has QUIT--Anony-Mousse Dec 23 '13 at 17:14
Okay, can you let me know how that can be accomplished in python? Any pointers? – Ouroboros Dec 23 '13 at 17:47
Well, implement the R-tree first ... I've only used the ELKI one, it's really fast, but Java. – Has QUIT--Anony-Mousse Dec 23 '13 at 18:31
How can I use weighted Minkowski norms using scipy implementation of kd-tree? Please point me to link with an example, Tnx – Ouroboros Dec 24 '13 at 06:47
I don't use scipy. I use ELKI. – Has QUIT--Anony-Mousse Dec 25 '13 at 19:24

score 0 · Accepted Answer · edited May 23 '17 at 11:56

After going through a lot of questions on stackoverflow, and finally going into details of scipy kd tree source code, I realised the answer by "celion" in following link is correct:

KD-Trees and missing values (vector comparison)

Excerpt:
"I think the best solution involves getting your hands dirty in the code that you're working with. Presumably the nearest-neighbor search computes the distance between the point in the tree leaf and the query vector; you should be able to modify this to handle the case where the point and the query vector are different sizes. E.g. if the points in the tree are given in 3D, but your query vector is only length 2, then the "distance" between the point (p0, p1, p2) and the query vector (x0, x1) would be

sqrt( (p0-x0)^2 + (p1-x1)^2 )

I didn't dig into the java code that you linked to, but I can try to find exactly where the change would need to go if you need help.

-Chris

PS - you might not need the sqrt in the equation above, since distance squared is usually equivalent."

K nearest neighbour search with weights on dimensions

2 Answers2