I am solving machine learning problem using python. My knowledge in machine learning is not much. The problem has given training dataset. Training dataset includes text samples and labels for those text samples. All possible values of labels are given. So this is supervised problem. Some text samples don't have empty set of labels. Now I have to make a model to find labels from given text data.
What I have done is, I have created pandas dataframe from training data. Dataframe has columns as [text_data, label1, label2, label3, ..., labeln]
. The values of labels columns are either 0 or 1. Then I cleaned and tokenized text_data. I removed stop words from tokens. I stemmed tokens by using PorterStemmer
. I split out dataframe into training data and validation data like 80:20. And now trying to make some model by predicting validation data's labels by using training data. But I am very much confused here about how to make model. I tried few things like Naive Bayes Classifier
but it didn't work or maybe I did some mistake. Any idea how I should proceed now?