I am looking for an efficient algorithm for the following problem:
Given a set of points in 2D space, where each point is defined by its X and Y coordinates. Required to split this set of points into a set of clusters so that if distance between two arbitrary points is less then some threshold, these points must belong to the same cluster:
In other words, such cluster is a set of points which are 'close enough' to each other.
The naive algorithm may look like this:
- Let R be a resulting list of clusters, initially empty
- Let P be a list of points, initially contains all points
- Pick random point from P and create a cluster C which contains only this point. Delete this point from P
- For every point Pi from P 4a. For every point Pc from C 4aa. If distance(Pi, Pc) < threshold then add Pi to C and remove it from P
- If at least one point was added to cluster C during the step 4, go to step 4
- Add cluster C to list R. if P is not empty, go to step 3
However, naive approach is very inefficient. I wonder if there is a better algorithm for this problem?
P.S. I don't know the number of clusters apriori