WEKA Preprocessing using SpamAssassin Dataset using stringtowordvector

Question

I am currently working on a project in which I will use the naive Bayes classification method to classify email as spam or clean. I am using WEKA and the well-known SpamAssassin dataset for this. (The dataset can be found here: http://www.csmining.org/index.php/spam-assassin-datasets.html).

I have very little experience with WEKA, but I was told to use the stringtowordvector filter when preprocessing the data. I am very confused as to how to do this. Has anyone worked with the SpamAssassin data and WEKA? Does anyone have any helpful links to assist with preprocessing?

score 1 · Answer 1 · answered Apr 21 '13 at 21:26

1

Use following tutorial Text Classification and Clustering with WEKA . You need to change your text data to numerical vectors, StringToWordVector filter accomplishes this task.

answered Apr 21 '13 at 21:26

Atilla Ozgur

14,339
3
49
69

WEKA Preprocessing using SpamAssassin Dataset using stringtowordvector

1 Answers1