Questions tagged [text-classification]

Simply stating, text classification is all about putting a piece of text into a set of (mostly predefined) categories. This is one of the most important problems which occurs in many real world applications. For example one example of text classification would be an automated call centre which would like to categorise the complaints automatically into the most appropriate bucket of problems.

Text classification is a sub-problem of a more general problem of classification. In this application, the input is represented with a piece of text (rather than images, sounds, videos etc). The output could be:

  • binary (binary classification)
  • one category out of k possible categories (multi-class)
  • a set of categories out of k possible categories (multi-label).

In text classification, the feature extracted from the text are usually sparse (instead of dense, like in image classification).

1694 questions
4
votes
4 answers

Good training data for text classification by LDA?

I'm classifying content based on LDA into generic topics such as Music, Technology, Arts, Science This is the process i'm using, 9 topics -> Music, Technology, Arts, Science etc etc. 9 documents -> Music.txt, Technology.txt, Arts.txt, Science.txt…
4
votes
2 answers

Python -- SciKit -- Text Feature Extraction of Classifer

I have to classify articles into my custom categories. So I chose MultinomialNB from SciKit. I am doing supervised learning. So I have an editor who look at the articles daily and then tag them. Once they are tagged I include them into my Learning…
planet260
  • 1,384
  • 1
  • 14
  • 30
4
votes
2 answers

Using bag of words classifier on a out-of-sample dataset

I recently used Bag-of-Words classifier to make a Document Matrix with 96% terms. Then I used a Decision Tree to train by model on the bag of words input to make a prediction whether the sentence is important or not. The model performed really well…
4
votes
1 answer

Feature Construction for Text Classification using Autoencoders

Autoencoders can be used to reduce dimensionallity in feature vectors - as far as I understand. In text classification a feature vector is normally constructed via a dictionary - which tends to be extremely large. I have no experience in using…
beyeran
  • 885
  • 1
  • 8
  • 26
4
votes
1 answer

Why KNN implementation in weka runs faster?

1) As we know KNN perform no computation in training phase instead defer all computations for classification because of which we call it lazy learner. It should take more time in classification than training however i found this assumption almost…
Kashif Khan
  • 301
  • 6
  • 17
4
votes
3 answers

How can i classify text documents with using SVM and KNN

Almost all of the examples are based on numbers. In text documents i have words instead of numbers. So can you show me simple examples of how to use these algorithms for text documents classification. I don't need code example but just…
Furkan Gözükara
  • 22,964
  • 77
  • 205
  • 342
3
votes
1 answer

Hugging Face Transformers BART CUDA error: CUBLAS_STATUS_NOT_INITIALIZE

I'm trying to finetune the Facebook BART model, I'm following this article in order to classify text using my own dataset. And I'm using the Trainer object in order to train: training_args = TrainingArguments( output_dir=model_directory, #…
3
votes
0 answers

tfa.metrics.F1Score custom metrics error : "Shapes must be equal rank"

I am trying to add F1Score as a metrics for a seq to seq classification task. The shapes of y_true and y_pred are the same but my custom metrics class keeps printing the following error : ValueError: Shapes must be equal rank, but are 1 and 2 for…
Tony
  • 41
  • 3
3
votes
0 answers

Tensorflow text classification in R using 3 classes - Error in py_call_impl(callable, dots$args, dots$keywords)

I'm working on a text classification problem that classifies some tweets into one of three labels. I have two columns in my dataset: Score column with the value of 0 (negative), 1 (positive) or 2 (neutral) and Statement column with the tweet text. I…
3
votes
3 answers

keras - Graph disconnected: cannot obtain value for tensor KerasTensor

I've been trying to create a 7 columns (features) model with Keras functional API and map it to the a 6 classes output. import tensorflow as tf from tensorflow.keras import Model from tensorflow.keras.layers import Input, Dense,…
Ben
  • 421
  • 6
  • 19
3
votes
1 answer

What do the logits and probabilities from RobertaForSequenceClassification represent?

Being new to the "Natural Language Processing" scene, I am experimentally learning and have implemented the following segment of code: from transformers import RobertaTokenizer, RobertaForSequenceClassification import torch path =…
Enigmatic
  • 3,902
  • 6
  • 26
  • 48
3
votes
2 answers

How can i get all outputs of the last transformer encoder in bert pretrained model and not just the cls token output?

I'm using pytorch and this is the model from huggingface transformers link: from transformers import BertTokenizerFast, BertForSequenceClassification bert = BertForSequenceClassification.from_pretrained("bert-base-uncased", …
3
votes
2 answers

Unable to restore a layer of class TextVectorization - Text Classification

System information Google Colab When I run the example provided by official tensorflow basic text classification, everything runs fine until the model save, but when I load the model it gives me this error. RuntimeError: Unable to restore a layer…
3
votes
4 answers

Fine-Tuning DistilBertForSequenceClassification: Is not learning, why is loss not changing? Weights not updated?

I am relatively new to PyTorch and Huggingface-transformers and experimented with DistillBertForSequenceClassification on this Kaggle-Dataset. from transformers import DistilBertForSequenceClassification import torch.optim as optim import torch.nn…
3
votes
0 answers

RStudio --> Error: Python module tensorflow.keras was not found

But I am working in RStudio. and facing the below error related with Keras & Tensorflow. Error: Python module tensorflow.keras was not found. Detected Python configuration: python: …