i am trying to use scikit for the Naive Basyes classification. i have couple of question (Also i am new to scikit)
1) Scikit Algorithms want input as a numpy array and label as arrays. In case of text classification should i map each of my word with a number (id) , by maintaining a hash of words in vocab and a unique id associated with it? is this is standard practice in scikit?
2) In case of assigning same text to more than one class how should i proceed. One obvious way is to replicate each training example one for each associated label. Any better representation exist?
3) Similarly for the test data how will i get more than one class associated with a test?
I am using http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html as my base.