Questions tagged [text-classification]

Simply stating, text classification is all about putting a piece of text into a set of (mostly predefined) categories. This is one of the most important problems which occurs in many real world applications. For example one example of text classification would be an automated call centre which would like to categorise the complaints automatically into the most appropriate bucket of problems.

Text classification is a sub-problem of a more general problem of classification. In this application, the input is represented with a piece of text (rather than images, sounds, videos etc). The output could be:

  • binary (binary classification)
  • one category out of k possible categories (multi-class)
  • a set of categories out of k possible categories (multi-label).

In text classification, the feature extracted from the text are usually sparse (instead of dense, like in image classification).

1694 questions
14
votes
1 answer

Scikit learn - fit_transform on the test set

I am struggling to use Random Forest in Python with Scikit learn. My problem is that I use it for text classification (in 3 classes - positive/negative/neutral) and the features that I extract are mainly words/unigrams, so I need to convert these to…
13
votes
2 answers

how to convert saved model from sklearn into tensorflow/lite

If I want to implement a classifier using the sklearn library. Is there a way to save the model or convert the file into a saved tensorflow file in order to convert it to tensorflow lite later?
13
votes
4 answers

Prevent over-fitting of text classification using Word embedding with LSTM

Objective : Identifying class label using user entered question (like Question Answer system). Data extracted from Big PDF file, and need to predict page number based on user input. Majorly used in policy document, where user have question about…
Somnath Kadam
  • 6,051
  • 6
  • 21
  • 37
13
votes
4 answers

Scalable or online out-of-core multi-label classifiers

I have been blowing my brains out over the past 2-3 weeks on this problem. I have a multi-label (not multi-class) problem where each sample can belong to several of the labels. I have around 4.5 million text documents as training data and around 1…
12
votes
1 answer

How to use Hugging Face Transformers library in Tensorflow for text classification on custom data?

I am trying to do binary text classification on custom data (which is in csv format) using different transformer architectures that Hugging Face 'Transformers' library offers. I am using this Tensorflow blog post as reference. I am loading the…
12
votes
2 answers

expected dense to have shape but got array with shape

I am getting the following error while calling the model.predict function when running a text classification model in keras. I searched the everywhere but it isn't working for me. ValueError: Error when checking input: expected dense_1_input to have…
Bhavesh Laddagiri
  • 365
  • 2
  • 5
  • 12
12
votes
2 answers

FastText using pre-trained word vector for text classification

I am working on a text classification problem, that is, given some text, I need to assign to it certain given labels. I have tried using fast-text library by Facebook, which has two utilities of interest to me: A) Word Vectors with pre-trained…
JarvisIA
  • 143
  • 1
  • 1
  • 4
11
votes
2 answers

InvalidArgumentError: 2 root error(s) found. Incompatible shapes in Tensorflow text-classification model

I am trying to get code working from the following repo, which is based off this paper. It had a lot of errors, but I mostly got it working. However, I keep getting the same problem and I really do not understand how to troubleshoot this/what is…
connor449
  • 1,549
  • 2
  • 18
  • 49
11
votes
3 answers

Why scikit learn confusion matrix is reversed?

I have 3 questions: 1) The confusion matrix for sklearn is as follows: TN | FP FN | TP While when I'm looking at online resources, I find it like this: TP | FP FN | TN Which one should I consider? 2) Since the above confusion matrix for scikit…
11
votes
1 answer

How to show topics of reuters dataset in Keras?

I use reuters dataset in Keras. And I want to know the 46 topics' names. How can I show topics of reuters dataset in Keras? https://keras.io/datasets/#reuters-newswire-topics-classification
hyeon
  • 373
  • 2
  • 4
  • 16
11
votes
1 answer

How do I properly combine numerical features with text (bag of words) in scikit-learn?

I am writing a classifier for web pages, so I have a mixture of numerical features, and I also want to classify the text. I am using the bag-of-words approach to transform the text into a (large) numerical vector. The code ends up being like…
11
votes
4 answers

How to deal with length variations for text classification using CNN (Keras)

It has been proved that CNN (convolutional neural network) is quite useful for text/document classification. I wonder how to deal with the length differences as the lengths of articles are different in most cases. Are there any examples in Keras? …
Fiong
  • 151
  • 1
  • 7
11
votes
2 answers

SkLearn Multinomial NB: Most Informative Features

As my classifier yields about 99% accuracy on test data, I am a bit suspicious and want to gain insight in the most informative features of my NB classifier to see what kind of features it is learning. The following topic has been very useful: How…
11
votes
1 answer

Python text processing: AttributeError: 'list' object has no attribute 'lower'

I am new to Python and to Stackoverflow(please be gentle) and am trying to learn how to do a sentiment analysis. I am using a combination of code I found in a tutorial and here: Python - AttributeError: 'list' object has no attribute However, I keep…
user3670554
  • 111
  • 1
  • 1
  • 4
10
votes
4 answers

Is it necessary to do stopwords removal ,Stemming/Lemmatization for text classification while using Spacy,Bert?

Is stopwords removal ,Stemming and Lemmatization necessary for text classification while using Spacy,Bert or other advanced NLP models for getting the vector embedding of the text ? text="The food served in the wedding was very delicious" 1.since…
star
  • 244
  • 1
  • 2
  • 10