multi-label supervised classification of text data

Question

I am solving machine learning problem using python. My knowledge in machine learning is not much. The problem has given training dataset. Training dataset includes text samples and labels for those text samples. All possible values of labels are given. So this is supervised problem. Some text samples don't have empty set of labels. Now I have to make a model to find labels from given text data.

What I have done is, I have created pandas dataframe from training data. Dataframe has columns as [text_data, label1, label2, label3, ..., labeln]. The values of labels columns are either 0 or 1. Then I cleaned and tokenized text_data. I removed stop words from tokens. I stemmed tokens by using PorterStemmer. I split out dataframe into training data and validation data like 80:20. And now trying to make some model by predicting validation data's labels by using training data. But I am very much confused here about how to make model. I tried few things like Naive Bayes Classifier but it didn't work or maybe I did some mistake. Any idea how I should proceed now?

Either debug your code or figure out how to select better features. Since you tagged this `nltk`, read the nltk book chapters on classification. — alexis, Apr 07 '17 at 08:56
Show us what you have tried in code? Full trace of error if any. — Vivek Kumar, Apr 10 '17 at 05:20

multi-label supervised classification of text data

0 Answers0