3

TL;DR

When implementing chatbot application, the first task is to detect intent for user's input text. This is a typical Multiclass Text Classification.

Intent is class and the number of class to detect is finite. Lots of utterances(examples) is prepared for each class and models is trained for this dataset. Any input text go through this trained model and results in predicted class.

In details of predicting

  • First, chatbot get probabilities of all classes (e.g., using sklearn's predict_proba).
  • And then chatbot find top probability's class (top class) and take that as predicted class.
  • At last, go to one more step. If this top class's probability is less than threshold (e.g., 0.2), chatbot handles this input as OutOfBound.

This OutOfBound detecting is very, very important for chatbot, specially in case of small number of classes

Let me show an extreme situation. Suppose that chatbot want to detect only yes|no. So, train dataset is like below.

<class>   <examples>
yes     yes
yes     ok
yes     I want
no      no
no      nope
no      I hate

After training, put in input text like this.

I want it => yes (0.89), no (0.11)

#sum of probabilities is 1. 

Of course, yes is top class and is predicted class. That's right. But, let's see another input like this.

I'm going home  => yes (0.54), no (0.46)

#sum of probabilities is 1. 

Top class is yes. But, do you think this is right result? I want the result like this.

I'm going home  => yes (0.013), no (0.011) 

#sum of probabilities is NOT 1. 
#Just probability for each class independently.(How calculate? I non't know. That's the point.)

Top class is yes. But, threshold is 0.2 so that the final result is OutOfBound. Maybe chatbot will return output "I don't understand what you're saying, Input again please."

====================

I'm using Tfidf + LogisticRegression (sklearn). Too simple? But, this is very good in case that the number of classes is large(e.g., over 100)

On the other hand, in case that the number of classes is small, prediction is ridiculous very often, so chatbot return ridiculous output at that time.

The cause of this error is because most Multiclass Text Classification models have softmax in the last step. Softmax normalize probabilities of all classes so that sum of probabilities is 1. So, an arbitrary class may be predicted for OutOfBound input.

This situation is same even in case of using kind of complex NN(FNN, MLP,RNN,CNN, even BERT). Because even though using that, softmax is included in the last step of model.

Softmax is very good in performance in case that any input must be assigned to one of classes. But, if detecting OutOfBound is required and the number of classes is small, model's prediction is ridiculous very often.

Are you expecting ovr(one versus rest) has no issues? Issue is the same,

Actually, the root cause is that supervised learning cannot detect OutOfBound. Because training dataset doesn't include OutOfBound data.That is , Detecting OutOfBound cannot stand together with supervised learning.

====================

So, how can I implement OutOfBound detectable Multiclass Text Classification? Of course that is involved with unsupervised learning

I tried below.

1)Topic analyzing

  • Regarding class as topic, calculate what topic input text is closest to.
  • For this, using semi-supervised LDA(e.g. GuidedLDA), train dataset so that examples is assigned to correspond topic.
  • After that, predict probabilities of all topics for input text and pick top probability topic. This top topic is predicted class.

Performance is very bad.

2)Similarity

  • Just prepare tfidf or word2vec for examples of train dataset.
  • When receive input text, calculate similarity of all examples of train dataset.
  • And then find top similarity's example for each class, and regard that probability as each class probability.
  • Finally, pick class which has top probability.

I tried cosine similarity (usiing tfidf or word2vec/fasttext) or SIF(Smooth Inverse Frequency) or WMD(word mover distance).

Performance is very bad.

3)Simple Multi Text Classification weighted with Cosine Similarity

Simple mean 1-layer FNN or LogisticRegression

  • First, calculate Multi Text Classification probabilities of all classes for input text (using tfidf)
  • Second, calculate cosine similarity using 2)method (using tfidf).
  • Third, multiply two results.

if

yes(0.54), no (0.45) ==> Multiclass Text Classification
yes(0.12), no (0.11) ==> Cosine Similarity

then

yes(0.54*0.12), no(0.45*0.11)  ==> final Probabilities

that is,

yes(0.0648), no(0.0495)

So results in OutOfBound.

Actually, simple multiply is not good, so I used kind of weighted function

Performance is quite good.

====================

3)of above is quite good in performance. So, I'm using that method.

That is, Simple Multiclass Text Classification with tfidf is excellent for accurary and speed than all other model, but cannot detect OutOfBound. So, I use that with similarity complement.

Finally, Quite Good.

But, not completely satisfied. Indeed this is the best? I'm not sure.

I have been researching and googling for years, I have not found any solution.

How Do you all Solve this Problem?

Is there the exact solution for this? Is there any model or proposal?

Or You all use your own complementary method like me?

Or Should I use entirely different model?

Sharing your experience or advice will make me very Happy.

Thanks In Advance Sincerely.

0 Answers0