Smallest Bounding Sphere containing x% of points

Question

Given a 3D point cloud, how can I find the smallest bounding sphere that contains a given percentage of points?

I.e. if I have a point cloud with some noise, and I want to ignore 5% of outliers, how can I get the smallest sphere that contains the 95% remaining points, if I do not know which points are the outliers?

Example: I want to find the green sphere, not the red sphere:

I am looking for a reasonably fast and simple algorithm. It does not have to find the optimal solution, a reasonable approximation is fine as well.

I know how to calculate the approximate bounding sphere for 100% of points, e.g. with Ritter's algorithm.

How can I generalize this to an algorithm that finds the smallest sphere containing x% of points?

How are these points distributed? Is the example typical (in that there will be a small cluster of points apart from the main cluster)? — Dave, Sep 26 '16 at 20:39

kfx · Answer 1 · 2016-09-26T15:59:42.287

Just an idea: binary search.

First, use one of the bounding sphere algorithms to find the 100% bounding sphere first.

Fix the centerpoint of the 95% sphere to be the same as the centerpoint of the 100% sphere. (There is no guarantee it is, but you say you're ok with approximate answer.) Then use binary search on the radius of the sphere until you get 95% +- epsilon points inside.

Assuming the points are sorted by their distance (or squared distance, to be slightly faster) from the centerpoint, for a fixed radius r it takes O(log n) operations to find the number of points inside the sphere with radius r, e.g. by using another binary search. The binary search for the right r itself requires logarithmic number of such evaluation. Therefore The whole search should take just O(log²n) steps after you have found the 100% sphere.

Edit: if you think the center of the reduced sphere could be too far away from the full sphere, you can recalculate the bounding sphere, or just the center of the mass of the point set, each time after throwing away some points. Each recaculation should take no more than O(n). After recalculation, resort the points by their distance from the new centerpoint. Since you expect them to be already nearly sorted, you can rely on bubble sort, which for nearly-sorte data works in O(n + epsilon). Remember that there will be just a logarithmic number of these tests needed, so you should be able to get away with close to O(n log² n) for the whole thing.

It depends on what exactly performance you're looking for and what you're willing to sacrifice for that. (I would be happy to learn that I'm wrong and there's a good exact algortihm for this.)

If I can assume that that the centerpoints of the spheres are the same, this idea seems sound. But I don't think I can make that assumption. If I have even one noise point that is very far away, then the 100% sphere will have a center that is far away from the center of the 95% sphere. so this only works when the noise points are approximately equally spread in each direction. Perhaps I need something like a 3d median to find the center. — HugoRune, Sep 26 '16 at 15:24
I was thinking about this as well. I think using the mass center of the point set should give better results, on the average. — kfx, Sep 26 '16 at 15:31
Good idea, but since you fix the the circle centre (which looks sensible to me BTW), your O(log^2 n) time bound for finding the optimal radius r can be sped up to O(log n): just sort the n points by their (squared) distances in O(log n) time, and then in O(1) time simply *read off* the (x\*n)-th point in this sorted list! Assuming no 2 points are equidistant from the centre, that tells you the furthest point that needs to be included, from which you can immediately determine the radius. — j_random_hacker, Sep 26 '16 at 17:01

ryanm · Answer 2 · 2016-09-27T11:13:33.270

1

The distance from the average point location would probably give a reasonable indication if a point is an outlier or not.

The algorithm might look something like:

Find bounding sphere of points
Find average point location
Choose the point on the bounding sphere that is farthest from the average location, remove it as an outlier
Repeat steps 1-3 until you've removed 5% of points

edited Sep 27 '16 at 11:13

answered Sep 26 '16 at 15:30

ryanm

2,979
18
22

Counter example in 1d: {-10,-7,-6,9,10}, bounding sphere =(center 0, radius 10), average location (barycenter)=-4/5, most distant=+10, resulting sphere=(-10,9), though we could have done much smaller sphere (-7,10) – aka.nice Sep 26 '16 at 19:17
Counter example 2: eliminate 1 outlier from {-10,-7,-6,-5,3,8,10}. Solution is removing -10 => diameter=17. Now eliminate 2 outliers: best solution = remove (8,10) => diameter=13. But if we eliminate a second element after -10, then solution is (-10,10) => diameter=15. So proceeding removals iteratively sounds subject to sub-optimality. – aka.nice Sep 26 '16 at 19:29
That said, sub-optimality is accepted, so maybe not that bad if we replace average location with geometric median to robustify a bit. – aka.nice Sep 27 '16 at 00:18
Agreed, it's a simple greedy algorithm so it's easy to find pathological input datasets. That said, I'm not sure I'm following your reasoning on counter example 2. The mean of the initial set is -1, so the first outlier would be 10. The mean of the remaining set is -2.8, so the second outlier would be 8. Have I misunderstood something? – ryanm Sep 27 '16 at 07:47
I'd also argue that, at least for the example data posted in the question, using the mean is a better choice than the median - using the median would lead to us removing points on the right-hand side of the green circle, thus wasting our 5% allowance that would be better spent on the points on the left. – ryanm Sep 27 '16 at 07:49
@ryann the 2nd example is demonstrating that even a perfect elimination of 1 point (compute sphere with the N partitions of N-1 points, and take the smallest) if applied recursively would differ from perfect elimination of two points, so the idea of iterating in itself is sub-optimal. – aka.nice Sep 27 '16 at 10:54
@ryann i do not agree with mean. in 1D again [-10000,-9000,-100,1,2,3,4,5,6,7,8,9,10] mean=-1465 median=+4, so the median is much closer to the right. Same in 2D. Effectively, it does not make so much difference if we eliminate only 1 by 1 and recompute mean, but would make a huge one if we eliminate several outliers in a row, that's more what i had in mind. – aka.nice Sep 27 '16 at 11:03
I see what you mean. I've edited the answer to clarify the repetition. – ryanm Sep 27 '16 at 11:14

aka.nice · Answer 3 · 2016-09-27T01:47:18.830

The algorithm of ryann is not that bad. I suggested robustifying with a geometric median then came to this sketch:

compute the NxN inter-distances in O(N^2)
sum each row of this matrix (= the distance of one point to the others) in O(N^2)
sort the obtained "crowd" distance in O(N*log N)
(the point with smallest distance is an approximation of the geometric median)
remove the 5% largest in O(1)
here we just consider largest crowd-distance as outliers,
instead of taking the largest distance from the median.
compute radius of obtained sphere in O(N)

Of course, it also suffers from sub-optimality but should perform a bit better in case of far outlier. Overall cost is O(N^2).

score 1 · Answer 4 · answered Sep 27 '16 at 04:00

I would iterate the following two steps until convergence:

1) Given a group of points, find the smallest sphere enclosing 100% of the points and work out its centre.

2) Given a centre, find the group of points containing 95% of the original number which is closest to the centre.

Each step reduces (or at least does not increase) the radius of the sphere involved, so you can declare convergence when the radius stops decreasing.

In fact, I would iterate from multiple random starts, each start produced by finding the smallest sphere that contains all of a small subset of points. I note that if you have 10 outliers, and you divide your set of points into 11 parts, at least one of those parts will not have any outliers.

(This is very loosely based on https://en.wikipedia.org/wiki/Random_sample_consensus)

Dave · Answer 5 · 2016-09-26T19:59:31.070

0

Find the Euclidean minimum spanning tree, and check the edges in descending order of length. For each edge, consider the sets of points points in the two connected trees you get by deleting the edge.

If the smaller set of points is less that 5% of the total, and the bounding sphere around the larger set of points doesn't overlap it, then delete the smaller set of points. (This condition is necessary in case you have an 'oasis' of empty space in the center of your point cloud).

Repeat this until you hit your threshold or the lengths are getting 'small enough' that you don't care to delete them.

edited Sep 26 '16 at 19:59

answered Sep 26 '16 at 19:50

Dave

7,460
3
26
39

In your example, the longest edge of the MST would connect one of the four outliers to a point of the main cloud. The first thing you would check here is deleting that edge, which would leave you with the main point cloud and the outlier cloud. Then you'd confirm that the bounding circle of the main cloud doesn't include points of the outlier cloud, and eliminate them. – Dave Sep 26 '16 at 20:12

Smallest Bounding Sphere containing x% of points

5 Answers5