0

I have a two columns in an excel file. Row 1 has the exact user input, and row 2 has its cause. e.g.

ROW 1                                     ROW 2
money deducted                            cause 1
delivery is late                          cause 2
something here                            cause 48
payment problem                           cause 1
.                                         .
.                                         .

The task is to implement a classifier that next time when a particular user input is given it can classify as one of the causes i.e. make the classifier learn of these cases and predict future values.

I have some knowledge about classification, but I just really want an idea how can I implement this using a one vs rest classifier.

  • 1
    Try reading about the classification algorithms such as `Naive Bayes Classifier`. Link to a simple tutorial: https://www.analyticsvidhya.com/blog/2015/09/naive-bayes-explained/ – Thiru Mar 10 '17 at 10:49
  • @Thiru Any idea on how to implement using one vs many classifer?? – Navya Singhal Mar 10 '17 at 10:56

1 Answers1

1

That is how you may implement this classifier using scikit-learn. Pass all training sentences to X_train and corresponding labels according to index of target_names.

X_train = np.array(["money deducted",
                    "delivery is late",
                    "something here",
                    "payment problem"])
y_labels = [(1, ), (2, ), (3, ), (1, )]
y_train = MultiLabelBinarizer().fit_transform(y_labels)
target_names = ['cause1', 'cause2', 'cause48']
classifier = Pipeline([
    ('vectorizer', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('clf', OneVsRestClassifier(LinearSVC()))])
classifier.fit(X_train, y_train)

That is all to train a classifier, then you may predict easily whatever you want. For more reference: http://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html

Then Fit and transform y_lables to Binarizer:

mlb.fit_transform(y_labels)

Then predict as following:

mlb.inverse_transform(classifier.predict(X_test))

This will give you class labels and then you may pass it as index to target_names.

Hope it helps!

abhinav
  • 1,108
  • 11
  • 23
  • I am a little confused regarding the imports fro the same. I have imported `sklearn` But still there are errors. What else do I need to import? – Navya Singhal Mar 10 '17 at 11:46
  • `import numpy as np` `from sklearn.pipeline import Pipeline` `from sklearn.feature_extraction.text import CountVectorizer` `from sklearn.svm import LinearSVC` `from sklearn.feature_extraction.text import TfidfTransformer` `from sklearn.multiclass import OneVsRestClassifier` `from sklearn.preprocessing import MultiLabelBinarizer` – abhinav Mar 10 '17 at 11:47
  • Thanks a lot! Saved my day! :) – Navya Singhal Mar 10 '17 at 12:04
  • I have a doubt, what if Row 1 has same user input and Row 2 has different issue corresponding to it. The classifier is not predicting anything in that case. What could be a possible fix? – Navya Singhal Mar 13 '17 at 11:02
  • You may use threshold in such cases. If prediction is above certain threshold then, it is considered as correct, otherwise incorrect. – abhinav Mar 15 '17 at 04:42