I am currently working on a project in which I will use the naive Bayes classification method to classify email as spam or clean. I am using WEKA and the well-known SpamAssassin dataset for this. (The dataset can be found here: http://www.csmining.org/index.php/spam-assassin-datasets.html).
I have very little experience with WEKA, but I was told to use the stringtowordvector filter when preprocessing the data. I am very confused as to how to do this. Has anyone worked with the SpamAssassin data and WEKA? Does anyone have any helpful links to assist with preprocessing?