I am using the k-means++ clusterer from Apache Commons Math in a interactive genetic algorithm to reduce the number of individuals that are evaluated by the user.
Commons Math makes it very easy to use. The user only needs to implement the
Clusterable
interface. It has two methods:
double distanceFrom(T p)
which is quite clear and T centroidOf(Collection<T> p)
, which lets the user pick the centroid of a cluster.
If used on euclidean points, the centroid is very easy to calculate. But on chromosomes it is quite difficult, because their meaning is not always clear.
My question: Is there a efficient generic way to pick the centroid, not depending on the problem domain? (E.g. by using the distance)
EDIT
Ok, here is now my code for the centroid calculation. The idea: The point that has the lowest total distance to all other points is the nearest to the centroid.
public T centroidOf(Collection<T> c) {
double minDist = Double.MAX_VALUE;
T minP = null;
// iterate through c
final Iterator<T> it = c.iterator();
while (it.hasNext()) {
// test every point p1
final T p1 = it.next();
double totalDist = 0d;
for (final T p2 : c) {
// sum up the distance to all points p2 | p2!=p1
if (p2 != p1) {
totalDist += p1.distanceFrom(p2);
}
}
// if the current distance is lower that the min, take it as new min
if (totalDist < minDist) {
minDist = totalDist;
minP = p1;
}
}
return minP;
}