-1

I'm trying to use DBSCANClusterer from apache.commons.math3.ml.clustering package with no success. I'm using Apache Common Math 3.4.1

When I run the DBSCANClusterer.cluster() method I always get one cluster with one point, which always corresponds to the first point in my list of Points.


public static void main(String[] args)
{

DBSCANClusterer dbscan = new  DBSCANClusterer(.9,2);
List<DoublePoint> points = new ArrayList<DoublePoint>();

double[] foo = new double[2];
int i = 0;

for (i =0; i<1000 ; i++)
{
    foo[0] = 10 + i;
    foo[1] = 20 + i;
    points.add(new DoublePoint(foo));
}

List<Cluster<DoublePoint>> cluster = dbscan.cluster(points);

// My output here is always: [1009 , 1019]    
for(Cluster<DoublePoint> c: cluster){
    System.out.println(c.getPoints().get(0));
}
}

My output is always: [1009.0, 1019.0] . What am I doing wrong here?

  • 1
    Did you ever plot your points?:) I think it will become very obvious to you. – Thomas Jungblut Mar 30 '15 at 15:28
  • Regardless of the set of points and eps chosen, the result is always the same. The variable 'cluster' has the first point in the list 'points'. I've even tested with points really close to each other, like 10.0001 to 10.0005 – João Vale Mar 30 '15 at 15:52
  • For instance, I've changed the 'foo[] = ' lines to "foo[0] = 10 + r.nextInt(100)/1000;". Still no success – João Vale Mar 30 '15 at 16:04

2 Answers2

0

Your first "cluster" is probably the noise cluster. With your parameters, all data is noise, and thus you only get one cluster, containing all your points.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
0

You create 1000 identical points as you reuse the foo array for each point.

T. Neidhart
  • 6,060
  • 2
  • 15
  • 38