I'm new to using Mallet. I usually use WEKA for classification, and now I'm trying to use Mallet for text classification. In Weka, there are attributes (such as word length or top-n word occurrence) that we choose ourselves and make the .arff file.
I have read about the input format for Mallet in http://mallet.cs.umass.edu/import.php but I'm still confused. How do we assign attribute in the input format? How do we tell this document belongs to a certain class? For example, a document belongs to "sports" class?
Any example of input format file will be very appreciated.
Thanks!