0

First of all: I'm using moa-release-2019.05.0-bin/moa-release-2019.05.0/lib/moa.jar in my java project.

Now, let's go to the point: I am trying to use moa.clusterers.clustream.WithKmeans stream clustering algorithm and I have no idea why this is happening ...

I am new into using moa and I am having a hard time trying to decode how the clustering algorithms have to be used. The documentation lacks of sample code for common usages, and the implementation is not that well explained ... have not found any tutorial either.

  • My code:
import com.yahoo.labs.samoa.instances.DenseInstance;
import moa.cluster.Clustering;
import moa.clusterers.clustream.WithKmeans;

public class TestingClustream {
    static DenseInstance randomInstance(int size) {
        DenseInstance instance = new DenseInstance(size);
        for (int idx = 0; idx < size; idx++) {
            instance.setValue(idx, Math.random());
        }
        return instance;
    }

    public static void main(String[] args) {
        WithKmeans wkm = new WithKmeans();
        wkm.kOption.setValue(5);
        wkm.maxNumKernelsOption.setValue(300);
        wkm.resetLearningImpl();
        for (int i = 0; i < 10000; i++) {
            wkm.trainOnInstanceImpl(randomInstance(2));
        }
        Clustering clusteringResult = wkm.getClusteringResult();
        Clustering microClusteringResult = wkm.getMicroClusteringResult();
    }
}

  • Info from the debugger:

enter image description here

enter image description here

I have read the source code many times, and it seems to me that I am using the correct functions, in the correct order ... I do not know what I am missing ... any feedback is welcomed!

onofricamila
  • 930
  • 1
  • 11
  • 20

1 Answers1

1

Make sure you have fed the algorithm enough data, it will process the data in batches.

The fields are unused, likely coming from some parent class with a different purpose.

Use the getter methods such as getCenter() that will compute the current center from the running sum.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • Hey! Thanks for the fast answer. I checked the source code and found that the **buffer size is set to the value of m (max number of micro clusters)**, which in my case is 300. Check [this line on github](https://github.com/Waikato/moa/blob/master/moa/src/main/java/moa/clusterers/clustream/WithKmeans.java#L83). As the number of samples I use is 10.000 (`i` upper limit in my for loop), there shouldn't be any problem with that .... any ideas? – onofricamila Nov 18 '19 at 00:54
  • 1
    Also note that `getCenter()` seems to compute the center from LS, and the `center` field may be unused or not yet populated. But the more I look at the MOA code, the less convinced I am... – Has QUIT--Anony-Mousse Nov 18 '19 at 06:48
  • Yes! That worked. Had to also use `getWeight()`, and cast the micro/macro clusters to **_SphereCluster_**, so I could call `getRadius()`. It's strange ... – onofricamila Nov 18 '19 at 18:41
  • Anony-Mousse! What's your opinion about the clusters `weight` attribute? At first I thought it represented the **amount of elements in one cluster**, but then, when I added up all the **micro clusters weights**, I did not get the total amount of samples (10000), but a **smaller number**. And, for the **macro clusters**, all the weights are the same = **0**. Also saw the field `N`, and it doesnt match with the number of elements either. Do you understand that? Thanks for your support. – onofricamila Nov 18 '19 at 20:56
  • 1
    I've never used MOA (I've used Weka, and it was very slow). My understanding is that this may be based on BIRCH ideas, and BIRCH does discard points it considers to be outliers. – Has QUIT--Anony-Mousse Nov 19 '19 at 06:40