0

I am currently working on a project in which I will use the naive Bayes classification method to classify email as spam or clean. I am using WEKA and the well-known SpamAssassin dataset for this. (The dataset can be found here: http://www.csmining.org/index.php/spam-assassin-datasets.html).

I have very little experience with WEKA, but I was told to use the stringtowordvector filter when preprocessing the data. I am very confused as to how to do this. Has anyone worked with the SpamAssassin data and WEKA? Does anyone have any helpful links to assist with preprocessing?

1 Answers1

1

Use following tutorial Text Classification and Clustering with WEKA . You need to change your text data to numerical vectors, StringToWordVector filter accomplishes this task.

Atilla Ozgur
  • 14,339
  • 3
  • 49
  • 69