7

I am an absolute beginner. Never made a classifier or anything in weka using Java I have used the interface before. Basically I am kind of lost I've looked at the filter class for weka and played around with it a little. My documents are text documents and I need to separate them into 2 categories.

I'm not sure how I define the categories or how I load the documents into an IDE to be classified

:-(

Any help/tutorials or pointers would be greatly appreciated.

jenniem001
  • 596
  • 1
  • 5
  • 16

2 Answers2

2

Using weka for the first time is a pain, but you will need to go through it.

Also, I tried out weka, but I had to dump it due to JVM out of memory exceptions. I wrote my own small clustering algo using Ruby, it's performance was way better.

Any way, here is how to use SVM in WEKA:

  1. You can follow this tutorial of how to use SVM in weka: www.stat.nctu.edu.tw/~misg/WekaInC.ppt

  2. Now, you will need data in ARFF format (and I recommend you use this, as per my exp, it helps, data looks more structured from WEKA's prespective). So, you can do that using XML2ARFF-Converter which I wrote for my self. You can modify it to read text files and convert your text file to ARFF.

zengr
  • 38,346
  • 37
  • 130
  • 192
  • can you elaborate on the out of memory exceptions? I'm investigating whether weka is a good fit for me, how bad are these problems? Did you look at increasing the heap? http://weka.wikispaces.com/OutOfMemoryException – Blub May 06 '11 at 15:11
  • i did not explore much. But weka is a widely used library, I am sure you will get some smart workarounds. I did not use it because I got a reason to use Ruby which I was trying to learn. – zengr May 06 '11 at 15:45
  • If you're working with large datasets you'll commonly come up against memory limits - if you're hitting these using weka try increasing the JVM heap size with the -Xmx flag: "java -Xmx8000m -jar weka.jar" will run weka with a heap of 8000mbs allocated. – Nicholas McCarthy May 07 '14 at 13:18
2

I found this java tutorial very helpful, although there are very few resources online available (that I have found)

http://www.cs.waikato.ac.nz/ml/weka/index_documentation.html

hope this helps

juanmirocks
  • 4,786
  • 5
  • 46
  • 46
Stina
  • 509
  • 1
  • 7
  • 19