Questions tagged [text-classification]

Simply stating, text classification is all about putting a piece of text into a set of (mostly predefined) categories. This is one of the most important problems which occurs in many real world applications. For example one example of text classification would be an automated call centre which would like to categorise the complaints automatically into the most appropriate bucket of problems.

Text classification is a sub-problem of a more general problem of classification. In this application, the input is represented with a piece of text (rather than images, sounds, videos etc). The output could be:

  • binary (binary classification)
  • one category out of k possible categories (multi-class)
  • a set of categories out of k possible categories (multi-label).

In text classification, the feature extracted from the text are usually sparse (instead of dense, like in image classification).

1694 questions
8
votes
2 answers

TensorFlow - Text Classification using Neural Networks

Is there any example on how can TensorFlow be used for text classification using neural networks?
Sumit Chawla
  • 369
  • 1
  • 4
  • 13
8
votes
2 answers

Testing the NLTK classifier on specific file

The following code run Naive Bayes movie review classifier. The code generate a list of the most informative features. Note: **movie review** folder is in the nltk. from itertools import chain from nltk.corpus import stopwords from…
ZaM
  • 137
  • 2
  • 9
8
votes
3 answers

Dealing with class imbalance in multi-label classification

I've seen a few questions on class imbalance in a multiclass setting. However, I have a multi-label problem, so how would you deal with it in this case? I have a set of around 300k text examples. As mentioned in the title, each example has at least…
8
votes
1 answer

Naive Bayes probability always 1

I started using sklearn.naive_bayes.GaussianNB for text classification, and have been getting fine initial results. I want to use the probability returned by the classifier as a measure of confidence, but the predict_proba() method always returns…
AviM
  • 99
  • 1
  • 5
7
votes
1 answer

Unable to train my keras model : (Data cardinality is ambiguous:)

I am using the bert-for-tf2 library to do a Multi-Class Classification problem. I created the model but training throws the following error: --------------------------------------------------------------------------- ValueError …
7
votes
3 answers

Improving on the basic, existing GloVe model

I am using GloVe as part of my research. I've downloaded the models from here. I've been using GloVe for sentence classification. The sentences I'm classifying are specific to a particular domain, say some STEM subject. However, since the existing…
cs95
  • 379,657
  • 97
  • 704
  • 746
7
votes
2 answers

GridSearchCV: How to specify test set?

I have a question regarding GridSearchCV: by using this: gs_clf = GridSearchCV(pipeline, parameters, n_jobs=-1, cv=6, scoring="f1") I specify that k-fold cross-validation should be used with 6 folds right? So that means that my corpus is split into…
user3629892
  • 2,960
  • 9
  • 33
  • 64
7
votes
1 answer

How to classify URLs? what are URLs features? How to select and Extract features from URL

I have just started to work on a Classification problem. Its a two class problem, My Trained model(Machine Learning) will have to decide/predict either to allow a URL or Block it. My Question is very specific. How to Classify URLs? Should i use…
6
votes
1 answer

Pre-Trained models for text Classification

So I have few words without labels but I need to classify them into 4-5 categories. I can visibly say that this test set can be classified. Although I do not have training data so I need to use a pre-trained model to classify these words. Which…
6
votes
1 answer

Passing multiple sentences to BERT?

I have a dataset with paragraphs that I need to classify into two classes. These paragraphs are usually 3-5 sentences long. The overwhelming majority of them are less than 500 words long. I would like to make use of BERT to tackle this problem. I am…
6
votes
1 answer

How can I use GPT 3 for my text classification?

I am wondering if I can be able to use OpenAI GPT-3 for transfer learning in a text classification problem? If so, how can I get start on it using Tensorflow, Keras.
6
votes
1 answer

Sliding window for long text in BERT for Question Answering

I've read post which explains how the sliding window works but I cannot find any information on how it is actually implemented. From what I understand if the input are too long, sliding window can be used to process the text. Please correct me if I…
6
votes
2 answers

No batch_size while making inference with BERT model

I am working on a binary classification problem with Tensorflow BERT language model. Here is the link to google colab. After saving and loading the model is trained, I get error while doing the prediction. Saving the Model def…
joel
  • 1,156
  • 3
  • 15
  • 42
6
votes
1 answer

Keras Embedding Layer: keep zero-padded values as zeros

I've been thinking about 0-padding of word sequence and how that 0-padding is then converted to the Embedding layer. At first glance, one would think that you want to keep the embeddings = 0.0 as well. However, Embedding layer in keras generates…
6
votes
1 answer

How to recognize entities in text that is the output of optical character recognition (OCR)?

I am trying to do multi-class classification with textual data. Problem I am facing that I have unstructured textual data. I'll explain the problem with an example. consider this image for example: I want to extract and classify text information…