Questions tagged [text-classification]

Simply stating, text classification is all about putting a piece of text into a set of (mostly predefined) categories. This is one of the most important problems which occurs in many real world applications. For example one example of text classification would be an automated call centre which would like to categorise the complaints automatically into the most appropriate bucket of problems.

Text classification is a sub-problem of a more general problem of classification. In this application, the input is represented with a piece of text (rather than images, sounds, videos etc). The output could be:

binary (binary classification)
one category out of k possible categories (multi-class)
a set of categories out of k possible categories (multi-label).

In text classification, the feature extracted from the text are usually sparse (instead of dense, like in image classification).

1694 questions

votes

2 answers

Get corresponding classes to predict_proba (GridSearchCV sklearn)

I'm using GridSearchCV and a pipeline to classify some text documents. A code snippet: clf = Pipeline([('vect', TfidfVectorizer()), ('clf', SVC())]) parameters = {'vect__ngram_range' : [(1,2)], 'vect__min_df' : [2], 'vect__stop_words' :…

python scikit-learn text-classification

asked Jul 20 '15 at 08:39

Josefine

votes

2 answers

Using Topic Model, how should we set up a "stop words" list?

There are some standard stop lists, giving words like "a the of not" to be removed from corpus. However, I'm wondering, should the stop list change case by case? For example, I have 10K of articles from a journal, then because of the structure of an…

stop-words lda topic-modeling text-classification

asked Feb 24 '15 at 18:09

Ruby

votes

2 answers

How do I transform text into TF-IDF format using Weka in Java?

Suppose, I have following sample ARFF file with two attributes: (1) sentiment: positive [1] or negative [-1] (2) tweet: text @relation sentiment_analysis @attribute sentiment {1, -1} @attribute tweet string @data -1,'is upset that he can\'t update…

machine-learning weka sentiment-analysis arff text-classification

asked Oct 09 '14 at 18:38

Hitesh Dholaria

votes

1 answer

How to use pickled classifier with countVectorizer.fit_transform() for labeling data

I trained a classifier on a set of short documents and pickled it after getting the reasonable f1 and accuracy scores for a binary classification task. While training, I reduced the number of features using a sciki-learn countVectorizer cv: cv…

python scikit-learn text-classification

asked Sep 23 '14 at 21:02

Gaurav Tuli

votes

2 answers

Lexicon dictionary for synonym words

There are few dictionaries available for natural language processing. Like positive, negative words dictionaries etc. Is there any dictionary available which contains list of synonym for all dictionary words? Like for nice synonyms: enjoyable,…

dictionary nlp stanford-nlp data-processing text-classification

asked May 17 '14 at 10:27

user2129623

2,167
3
35
64

votes

1 answer

Can you recommend a package in R that can be used to count precision, recall and F1-score for multi class classification tasks

Is there any package that you would recommend which can be used to calculate the precision, F1, recall for multi class classification task in R. I tried to use ROCR but it states that: ROCR currently supports only evaluation of binary…

r text-classification precision-recall

asked Apr 08 '14 at 07:39

tanay

votes

1 answer

Fine-tuning a pretrained Spanish RoBERTa model for a different task, sentiment analysis

I'm doing sentiment analysis of Spanish tweets. After reviewing some of the recent literature, I've seen that there's been a most recent effort to train a RoBERTa model exclusively on Spanish text (roberta-base-bne). It seems to perform better than…

python tensorflow sentiment-analysis text-classification huggingface-transformers

asked Sep 27 '21 at 21:14

LeLuc

votes

1 answer

Resampling dataset for spam classification

I have a class imbalance problem with the following dataset: Text is_it_capital? is_it_upper? contains_num? Label an example of text 0 0 0 …

python scikit-learn classification text-classification resampling

asked Feb 17 '21 at 14:58

LdM

votes

1 answer

ALBERT not converging - HuggingFace

I'm trying to apply a pretrained HuggingFace ALBERT transformer model to my own text classification task, but the loss is not decreasing beyond a certain point. Here's my code: There are four labels in my text classification dataset which are: 0, 1,…

machine-learning nlp text-classification transformer-model huggingface-transformers

asked Jun 20 '20 at 18:36

beginner

votes

2 answers

Spacy TextCat Score in MultiLabel Classfication

In the spacy's text classification train_textcat example, there are two labels specified Positive and Negative. Hence the cats score is represented as cats = [{"POSITIVE": bool(y), "NEGATIVE": not bool(y)} for y in labels] I am working with…

spacy text-classification multilabel-classification

asked Jun 12 '20 at 08:03

Subha Maharjan

votes

1 answer

FastText 0.9.2 - why is recall 'nan'?

I trained a supervised model in FastText using the Python interface and I'm getting weird results for precision and recall. First, I trained a model: model = fasttext.train_supervised("train.txt", wordNgrams=3, epoch=100,…

python-3.x nlp text-classification precision-recall fasttext

asked May 14 '20 at 00:21

abstrakkt

votes

0 answers

How to handle text classification model that gives few results with higher confidence to wrong category?

I had a dataset of 15k records. I trained the model using a k-train package and 'bert' model with 5k samples. The train-test split is 70-30% and test results gave me accuracy and f1 scores as 93-94%. I felt the model is well trained, But on…

python machine-learning text-classification false-positive bert-language-model

asked May 12 '20 at 13:44

Giri Sai Ram

votes

1 answer

Difference between blank and pretrained models in spacy

I am currently trying to train a text classifier using spacy and I got stuck with following question: what is the difference between creating a blank model using spacy.blank('en') and using a pretrained model spacy.load('en_core_web_sm'). Just to…

python spacy text-classification

asked Mar 27 '20 at 14:28

Oleg Ivanytskyi

votes

1 answer

How to make a prediction as binary output? - Python (Tensorflow)

I'm learning text classification using movie reviews as data with tensorflow, but I got stuck when I get an output prediction different (not rounded, not binary) to the label. CODE predict = model.predict([test_review]) print("Prediction: " +…

python tensorflow prediction text-classification

asked Jan 28 '20 at 10:22

Y4RD13

votes

3 answers

Receiving, "An error was thrown and was not caught: The validation data provided must contain ..." when creating a Text Classifier Model with CreateML

I am using Playground to create a Text Classifier Model using CreateML and keep getting the error: Playground execution terminated: An error was thrown and was not caught: ▿ The validation data provided must contain class. ▿ type : 1 element -…

validation text-classification createml

asked Jan 07 '20 at 22:35

Jerry Rufe

Prev 1 2 3

…

99 100 Next