RapidMiner and WEKA : Different clustering result

Question

I am new in Data Mining analytic and Machine Learning. I have been trying to compare the use of Predictive analysis and Clustering analysis using RapidMiner and Weka for my college assignment.

Just after I study the advantages and disadvantages from both tools and starting to do the analyzing process I found some problems. I tried doing Clustering using K-means and simpleKmeans for Weka and Regression analysis using LinearRegression and I am not quite satisfied with the result, since they contain result that significantly different. all of that I used a same datasets. numerical datasets.

I have been spending a lot of my time trying to figure something out by studying the initialization for each algorithm each tools since the interface is different and there are some parameter that is on RapidMiner but not in Weka or otherwise, so I am a bit confused. (is it the problem?)

Despite that what do you think is wrong? is there some initialization process that I missed? or is it because the code is different in each tools even they use the same algorithm?

Thank you for your answer!

score 2 · Accepted Answer · answered Dec 02 '14 at 17:42

2

Weka often uses built-in normalization at least in k-means and other algorithms.

Make sure you have disabled this if you want to make results comparable.

Also understand that k-means is a randomized algorithm. Different results even from the same package are to be expected (and desirable).

answered Dec 02 '14 at 17:42

Has QUIT--Anony-Mousse

76,138
12
138
194

Thank you, that's just something that I need. but how do we disabled it? I've searched in the weka explorer yet I found nothing about how to disable that. beside that, I've done some google search about that and yes I found some paper that explain it. but it said nothing about what normalization method weka uses. since I don't know how to disabled it I've been trying adding normalization operator in RapidMiner and tried with every method available and yet still not comparable. Do you have any idea? I appreciate your response very much ! :) – M.R. Murazza Dec 05 '14 at 07:20
IIRC (I don't use Weka much, ELKI is much faster) there was an option to the distance function. – Has QUIT--Anony-Mousse Dec 05 '14 at 08:33

score 0 · Answer 2 · answered Dec 02 '14 at 14:50

0

did you use WEKA itself or rapidminer's WEKA extension? Did you try to compare the results of WEKA with RM WEKA?

answered Dec 02 '14 at 14:50

mschmitz

1

I used WEKA itself. yes I've tried that too, and its result is the same. So the problem is indeed from the simpleKmeans algorithm in Weka just like Anony-Mousse answered. it contains built-in normalization – M.R. Murazza Dec 05 '14 at 07:23

RapidMiner and WEKA : Different clustering result

2 Answers2