11

I have one set (X) of points (not very big let's say 1-20 points) and the second (Y), much larger set of points. I need to choose some point from Y which sum of distances to all points from X is minimal.

I came up with an idea that I would treat X as a vertices of a polygon and find centroid of this polygon, and then I will choose a point from Y nearest to the centroid. But I'm not sure whether centroid minimizes sum of its distances to the vertices of polygon, so I'm not sure whether this is a good way? Is there any algorithm for solving this problem?

Points are defined by geographical coordinates.

Pawel Markowski
  • 1,186
  • 3
  • 12
  • 21
  • Do you mean latitude-longitude on a curved surface, or x-y on a plane? – David Thornley Jan 05 '11 at 22:31
  • 2
    Centroid doesn't minimize the sum of distances to vertices. For example, in case of a triangle Torricelli point (http://en.wikipedia.org/wiki/Torricelli_point) is optimal. – adamax Jan 06 '11 at 14:48

4 Answers4

4

Centroid of the polygon might not be right, but such a point exists.

In the paper: n-ellipses and the minimum distance problem, it is shown that if the points (called foci, your set of X) are not collinear then

  • There is a unique point (called center) for which the sum of distances are minimized. This point is such that the sum of unit vectors from that point to the foci is zero!

  • The locus of points for which the sum of distances is constant is a convex curve (called an n-ellipse) containing the center

  • The n-ellipse for distance D completely contains the n-ellipse for any other distance D' for which D' < D.

Thus you can do some type of hill climbing algorithm to find the center.

Of course these n-ellipses are not necessarily circles, so just picking the point closest to the center might not work, but might be a good approximation.

You can perhaps do some preprocessing on the 20 points (if those are fixed) to figure out a good partitioning scheme (based on the above information).

Hope that helps.

  • I think there's no need for simulated annealing. A simple hill climbing will do, since there's only one local minimum here. – adamax Jan 06 '11 at 14:45
  • @adam: Yeah, I meant hill climbing actually ( out of touch with those :-)). Thanks, will edit. –  Jan 06 '11 at 18:04
1

If you want to minimize the sum of the squares of the distances (not the sum of the distances), then the point that minimizes that sum is the average of the points in X.

Proof:

sum(squares of distances) = (x-x0)^2 + (y-y0)^2 + (x-x1)^2 + (y-y1)^2 + ... 

d/dx sum(squares of distances) = 2(x-x0) + 2(x-x1) + ... = 2(Nx - x0 - x1 - ...)

the sum is minimized when the derivative is zero, which occurs when Nx = x0+x1+..., so x = (x0+x1+...)/N

The derivative is symmetric around this point, and the function is quadratic, so I'm pretty sure the closest point in Y to this average point is the best.

Minimizing the distances is harder, but I suspect the same algorithm, with more leeway in the set of Ys that you test, would work also.

Keith Randall
  • 22,985
  • 2
  • 35
  • 54
  • I don't think you are using the term sum of squares in the usual way. If we are talking about a valid metric then the distance between any two points will always be greater than or equal to 0. – Samsdram Jan 06 '11 at 01:46
  • I mean the usual sum of squares metric, and it is always >= 0. What makes you think it isn't? – Keith Randall Jan 06 '11 at 02:49
  • My point has more to do with clarity of exposition than math. The OP asked for the point in Y that minimizes the sum of the distances between that point and the points in X. The OP didn't specify a distance metric though, such as the Euclidean norm which you describe as the sum of squares. Suppose though the OP asked for the point in Y that minimized the sum of the squared distance between the point in Y and the points in X. Then the spatial mean wouldn't be a workable solution. – Samsdram Jan 06 '11 at 03:37
  • 1
    In absense of a specified norm, I was assuming the OP meant the Euclidean norm. I am using the square of the Euclidean norm, mostly because the math works out. So I think you have it backwards, the spatial mean works for the square of the Euclidean norm, not the Euclidean norm itself. – Keith Randall Jan 06 '11 at 16:33
1

Because you want the minimal sum of distances I believe that you can reduce the set of points X to its spatial mean. Then you can use a KDTree or some sort of spatial partitioning tree to find the point in Y closest to the spatial mean of X. Using a spatial partitioning tree can save a good bit of work compared to checking all the possible points.

Samsdram
  • 1,615
  • 15
  • 18
0

Excuse me for suggesting brute force. The way the question is posed we do not know where X,Y lie. Suppose X is 30 points, Y is 1000 points. Then for each point of Y sum 30 distances. Altogether 30000 calculations, done in a jiffy. This guarantees a minimum. Finding some "center" of X and choosing the closest Y will be an approximate solution only.

The more interesting question is to find such a point for X alone. ignore Y. For X three points only, the Fermat-Torichelli point solves the problem.