SVM newbie - I have 160 categories with varying from few to many membership terms and phrases, for training data. Some categories have few phrases, and others have hundreds.
I have lots of text testing data with a wide topical variety. I think I want a MultiClass, oneVsRest SVM, binary classifier.
1) Should the training input for 1 categories SVM be a set of lines with 1 feature3:1 feature5:1 ... for the positive membership, where feature is a term/phrase from the class membership list - is Binary value sufficient? and lines of -1 feature1:1 feature2:1 feature4:1... for all members of other classes in the dictionary of known_terms_of_interest?
2) Should the testing docs input only include terms found in the dictionary of known_terms_of_interest?
3) is linear correct? - C 1 ? or because there are few terms in some RBF?
It seems examples begin with preprocessed files and not raw text; so I'm missing the key setup placement steps, as the documentation goes into descriptions of margins and such.