Text classification into predefined categories

Question

I am trying to classify text data into a few categories. But in the data set, there can be data that does not belong to any of the defined categories.

And after deploying the final product, the product should be deal with text data that does not belong to the predefined category.

To implement that solution I am currently using the SVM text classifier. And I am planning to define another category as

"non"

to deal with the data that does not belong to predefined categories.

Is this a correct approach?

score 0 · Answer 1 · answered Mar 06 '20 at 12:12

Yes, that would work. It is essentially an additional class called "non", for which the classifier will learn to classify all the documents into, that are labeled as that class.

So when you use your final product, it will try to classify the new text data into the classes, including "non".

Text classification into predefined categories

1 Answers1