1

I'm comparing various classification algorithms for a project using knime. I was very happy with the results I got for Support Vector Machines (LibSVM). I then wanted to try hierarchical classification and installed the Rapidminer plugin for knime. To get things to work I first tested the SVM implementation without hierarchies.

Comparing the results of the knime LibSVM implementation and the rapidminer LibSVM implementation I noticed that the rapidminer implementation yielded worse results. The knime implementation in fact produced an error rate of approximately 2.4% while the rapidminer one produced an error rate of approx. 61%. Why is that? Am I doing something wrong?

Confusion Matrix Comparison

I use C-SVC SVMs with linear kernel, 1.0 Cost, 0.001 epsilon and 80mb cache for both implementations.

Rapidminer Workflow and Options Knime Options

The documents are wikipedia article texts, preprocessed, transformed to a binary document vector and labeled with some kind of type.

I hope you can help me.

user2509422
  • 125
  • 10
  • 1
    It is a bit hard to say without the library versions (I guess RapidMiner uses older LibSVM than KNIME), but it is possible that there were a bug in older LibSVM versions. It is also possible I have made an error in data transformation and RapidMiner (within KNIME) sees wrong data. Could you check with RapidMiner 5.3.15 version outside KNIME? (It might be also interesting to check with the latest 7.1.x version or if you have access to the 7.2 preview. I know there are some crazy limitations in recent RapidMiners regarding import/export. I understand if you prefer to not check those.) Thanks. – Gábor Bakos May 18 '16 at 18:13
  • Thank you very much for your reply. The data transformation seems to be the problem. Using rapidminer studio (5.3.15) the error rate sank to 3.3%, which is sufficient to continue. – user2509422 May 20 '16 at 10:02
  • I have tried to reproduce your problem: https://drive.google.com/file/d/0B_71sAq09Nu2VzM3bmxoakNJN00/view?usp=sharing though I had only minor differences. What I find suspicious is that you are using a `Nominal to ...` operator within RapidMiner. I think that is not required. (In my workflow I have removed the row id column from the process to emphasize this change, but you can also just click on the `Use` button on the `Row ID` tab to not use it (its text will become `Do not use`).) – Gábor Bakos May 21 '16 at 09:15

1 Answers1

2

You do not need to include the Row IDs in this case (Row ID tab, make to button show Do not use by clicking on it in case it is Use and the text field is not disabled), and you should not perform Nominal to... transformations on them. After that, you should get similar results in both cases.

Gábor Bakos
  • 8,982
  • 52
  • 35
  • 52