-1

I am using textblob lib for classification using naive bayes , I have a train set and wants to check if I pass a word it should check in the train and classify accordingly and if the word is not present in the train it should not suggest any classification.

example : kartik is not in the train set , however it is classifying it as '1', and same for any other words which are not present in the training set.

is there any way if I suggest some word which is not in train it should not give '1'.

from textblob import TextBlob
from textblob.classifiers import NaiveBayesClassifier


train = [
 ('System is working fine', '1'),
 ('Issue Resolved ', '1'),
 ('Working Fine ', '1'),
 ('running smoothly', '1'),
 ("server is working fine ", '1'),
 ('software installed properly', '1'),
 ('Ticket resolved ', '1'),
 ("Laptop is not working ", '-1'),
 ('laptop issue', '-1'),
 ('upgrade laptop', '-1'),
 ('software not working','-1'),
 ('fix the issue','-1'),
 ('WIFI is not working','-1'),
 ('server is down','-1'),
 ('system is not working','-1')


]

c1 = NaiveBayesClassifier(train)
c1.classify("kartik")
Dexter1611
  • 492
  • 1
  • 4
  • 15

1 Answers1

0

You can try using getting the probability of classification and then set a threshold, ignoring the class labels below the given.

prob_dist = cl.prob_classify("Lorem Ispum dolor sit amet")
cl.classify("Lorem Ipsum Dolor sit amet")
print(round(prob_dist.prob("1"), 2))
print(round(prob_dist.prob("-1"),2))

0.61

0.39

I observed that all non-existing words are giving a prob of 0.61 for class 1. You can use this as a starting point.

However, test for all correct cases properly. Setting a threshold may have adverse effects on some correct classifications.

In any case, increase size of your train data and you'll see better results which can help you set a threshold

Yuvraj Jaiswal
  • 1,605
  • 13
  • 20