WEKA HierarchicalClusterer class always return 2 clusters

Question

Here is my code:

import weka.clusterers.ClusterEvaluation;
import weka.clusterers.HierarchicalClusterer;
import weka.clusterers.EM;
import weka.core.converters.CSVLoader;
import weka.core.converters.ConverterUtils.DataSource;
import weka.core.neighboursearch.PerformanceStats;

import java.io.File;
import java.io.IOException;
import java.text.ParseException;
import java.util.ArrayList;
import java.util.Enumeration;

import weka.core.*;

public class WEKASample1 {

public static void main(String[] args) {

    Instances data = null;
    CSVLoader csvLoader = new CSVLoader();
    try {
        csvLoader.setSource(new File("D:\\WEKA\\numbers.csv"));

        data = csvLoader.getDataSet();
                HierarchicalClusterer h = new HierarchicalClusterer();

            DistanceFunction d = new DistanceFunction() {

        @Override
        public void setOptions(String[] arg0) throws Exception {

        }

        @Override
        public Enumeration listOptions() {
            return null;
        }

        @Override
        public String[] getOptions() {
            return null;
        }

        @Override
        public void update(Instance arg0) {

        }

        @Override
        public void setInvertSelection(boolean arg0) {

        }

        @Override
        public void setInstances(Instances arg0) {

        }

        @Override
        public void setAttributeIndices(String arg0) {

        }

        @Override
        public void postProcessDistances(double[] arg0) {

        }

        @Override
        public boolean getInvertSelection() {
            return false;
        }

        @Override
        public Instances getInstances() {
            return null;
        }

        @Override
        public String getAttributeIndices() {
            return null;
        }

        @Override
        public double distance(Instance arg0, Instance arg1, double arg2,
                PerformanceStats arg3) {
            return 0;
        }

        @Override
        public double distance(Instance arg0, Instance arg1, double arg2) {
            return 0;
        }

        @Override
        public double distance(Instance arg0, Instance arg1, PerformanceStats arg2)
                throws Exception {
            return 0;
        }

        @Override
        public double distance(Instance arg0, Instance arg1) {

            double s1 = arg0.value(0);
            double s2 = arg1.value(0);

            return Double.POSITIVE_INFINITY;
        }
    };

    h.setDistanceFunction(d);
    SelectedTag s = new SelectedTag(1, HierarchicalClusterer.TAGS_LINK_TYPE);
    h.setLinkType(s);

    h.buildClusterer(data);


//      double[] arr;
//      for(int i=0; i<data.size(); i++) {
//          
//          arr = h.distributionForInstance(data.get(i));
//          for(int j=0; j< arr.length; j++)
//              System.out.print(arr[j]+",");
//          System.out.println();
//          
//      }

        System.out.println(h.numberOfClusters());
    } catch (Exception e) {
        e.printStackTrace();
    }

}

}

Now, the output for the number of clusters generated is always 2 even if I modify the distancefucntion method also. How do I know which instance if of which cluster? When I uncomment the code above that is written to get the distribution for the instances, I get an ArrayOutOfBound exception.

But in general, can anyone explain how is the clustering done hierarchically by WEKA here?

Here is my data set, that is of length 10 and dimension 2:

Has QUIT--Anony-Mousse · Answer 1 · 2012-06-05T20:31:25.877

2

Try a real data set, not an evenly spaced array of points.

Because they all have the same distance to the next! With single link, this should be a single cluster, but maybe there are some rounding issues.

Plus, the distance function you use is all 0/Infinity, too!

Try using the Weka UI first.

edited Jun 05 '12 at 20:31

answered Jun 05 '12 at 13:04

Has QUIT--Anony-Mousse

76,138
12
138
194

Thanks will do. But the distance function written here tells WEKA not to cluster any two instances by returning an Infinity value right? – London guy Jun 05 '12 at 18:47
I wouldn't bet that this says "do not cluster" to WEKA. It might just cluster these at the top level possible. They probably don't test for infinity. – Has QUIT--Anony-Mousse Jun 05 '12 at 20:32
Thanks. I modified the WEKA HierarchicalClusterer code now to stop he clustering process when a particular condition is not met. – London guy Jun 07 '12 at 03:52
WEKA hierarchical clustering could use a stop threshold. But I guess it is an `O(n^3)` implementation anyway, even for single-, average- and complete-link, where `O(n^2)` algorithms exist as far as I know. WEKA isn't very strong in clustering. – Has QUIT--Anony-Mousse Jun 07 '12 at 08:37

WEKA HierarchicalClusterer class always return 2 clusters

1 Answers1