I'm trying to cluster image data (stored in 100 separate csv files) with ELKI's XMeans algorithm. It works well for the first two files, but then the algorithm hangs on forever while processing the third file. It looks like the problem occurs at every 3rd file or so, because when I start the loop, that goes over all files at the fourth file, it works for the fourth and the fifth file, but not for the sixth file. Same goes for the 9th and 11th file... but maybe that's coincidence.
My XMeans call looks like this:
DatabaseConnection dbc = new ArrayAdapterDatabaseConnection(data);
Database db = new StaticArrayDatabase(dbc, null);
db.initialize();
Relation<NumberVector> rel = db.getRelation(TypeUtil.NUMBER_VECTOR_FIELD);
DBIDRange ids = (DBIDRange) rel.getDBIDs();
SquaredEuclideanDistanceFunction dist = SquaredEuclideanDistanceFunction.STATIC;
RandomlyGeneratedInitialMeans init = new RandomlyGeneratedInitialMeans(RandomFactory.DEFAULT);
KMeansInitialization initializer = new FirstKInitialMeans();
PredefinedInitialMeans splitInitializer = new PredefinedInitialMeans(data);
KMeansQualityMeasure informationCriterion = new WithinClusterMeanDistanceQualityMeasure();
RandomFactory random = new RandomFactory(123);
KMeans<NumberVector, KMeansModel> innerKMeans = new KMeansHamerly<>(dist, 50, 1, init, true);
XMeans<NumberVector, KMeansModel> xm = new XMeans<>(dist, 5, 50, 1, innerKMeans, initializer, splitInitializer, informationCriterion, random);
Clustering<KMeansModel> c = xm.run(db, rel);
I'm not too sure about these four lines, so maybe that's why it works for some files and for others it doesn't:
KMeansInitialization initializer = new FirstKInitialMeans();
PredefinedInitialMeans splitInitializer = new PredefinedInitialMeans(data);
KMeansQualityMeasure informationCriterion = new WithinClusterMeanDistanceQualityMeasure();
RandomFactory random = new RandomFactory(123);
data is just a double[][] which contains the data from the input files.
Any help would be very appreciated!