Questions tagged [text-classification]

Simply stating, text classification is all about putting a piece of text into a set of (mostly predefined) categories. This is one of the most important problems which occurs in many real world applications. For example one example of text classification would be an automated call centre which would like to categorise the complaints automatically into the most appropriate bucket of problems.

Text classification is a sub-problem of a more general problem of classification. In this application, the input is represented with a piece of text (rather than images, sounds, videos etc). The output could be:

binary (binary classification)
one category out of k possible categories (multi-class)
a set of categories out of k possible categories (multi-label).

In text classification, the feature extracted from the text are usually sparse (instead of dense, like in image classification).

1694 questions

votes

4 answers

How to find outliers in document classification with million documents?

I have million documents which belongs to different classes (100 classes). I want to find outlier documents in each class (which doesn't belong to that class but wrongly classified) and filter them. I can do document similarity using cosine…

asked Dec 19 '19 at 09:57

Gaurav Chawla

1,473
3
14
19

votes

2 answers

How to do sequence classification with pytorch nn.Transformer?

I am doing a sequence classification task using nn.TransformerEncoder(). Whose pipeline is similar to nn.LSTM(). I have tried several temporal features fusion methods: Selecting the final outputs as the representation of the whole sequence. Using…

machine-learning deep-learning pytorch text-classification transformer-model

asked Sep 25 '19 at 06:02

Whisht

votes

1 answer

Finetuning BERT on custom data

I want to train a 21 class text classification model using Bert. But I have very little training data, so a downloaded a similar dataset with 5 classes with 2 million samples.t And finetuned downloaded data with uncased pretrained model provided by…

tensorflow deep-learning nlp text-classification bert-language-model

asked May 04 '19 at 05:40

danishansari

votes

2 answers

How to represent ELMo embeddings as a 1D array?

I am using the language model ELMo - https://allennlp.org/elmo to represent my text data as a numerical vector. This vector will be used as training data for a simple sentiment analysis task. In this case the data is not in english, so I downloaded…

machine-learning nlp classification text-classification word-embedding

asked Oct 30 '18 at 09:45

Isbister

votes

1 answer

Cannot freeze Tensorflow models into frozen(.pb) file

I am referring (here) to freeze models into .pb file. My model is CNN for text classification I am using (Github) link to train CNN for text classification and exporting in form of models. I have trained models to 4 epoch and My checkpoints folders…

python python-3.x tensorflow text-classification tensorflow-serving

asked Jul 27 '18 at 01:23

Ajinkya

1,797
3
24
54

votes

1 answer

python LightGBM text classicication with Tfidf

I'm trying to introduce LightGBM for text multiclassification. 2 columns in pandas dataframe, where 'category' and 'contents' are set as follows. Dataframe: contents category 1 this is example1... A 2 this is…

python tf-idf text-classification lightgbm

asked May 09 '18 at 09:53

SY9

votes

1 answer

Accuracy below 50% for binary classification

I am training a Naive Bayes classifier on a balanced dataset with equal number of positive and negative examples. At test time I am computing the accuracy in turn for the examples in the positive class, negative class, and the subsets which make up…

machine-learning binary floating-accuracy text-classification

asked May 03 '18 at 14:33

Crista23

3,203
9
47
60

votes

0 answers

How TF-IDF handles missing values?

I am working on a classification problem in which I have to classify product category based on the information of the product like title, description and other attributes. It is working for different categories but getting biased in closed…

python-3.x tf-idf text-classification

asked Jan 04 '18 at 04:57

Sumit S Chawla

3,180
1
14
33

votes

1 answer

ValueError: Variable Embedding already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined

Based on this github link https://github.com/brightmart/text_classification/tree/master/a03_TextRNN While I run train a03_TextRNN with google_news_wor22vec.bin and a text file with my documents + labels, I've got these errors : How can I solve this…

classification text-classification recurrent-neural-network multilabel-classification fasttext

asked Dec 08 '17 at 15:26

brelian

votes

1 answer

text classification of large dataset in python

I have 2.2 million data samples to classify into more than 7500 categories. I am using pandas and sckit-learn of python to do so. Below is the sample of my dataset itemid description category 11802974…

python pandas scikit-learn large-data text-classification

asked Dec 03 '17 at 15:19

Ranjana Girish

votes

2 answers

Make a prediction using mxnet CNN model

Hi I'm a newbie to data science, I followed this tutorial https://mxnet.incubator.apache.org/tutorials/nlp/cnn.html but I am confused over how to make a single prediction using the trained model generated by the above mentioned tutorial. Please…

python conv-neural-network data-science text-classification mxnet

asked Sep 21 '17 at 10:01

Zann

votes

1 answer

Difference between TaggedDocument and TaggedLineDocument in gensim? and How to work with files in a directory?

I am new to doc2vec and I wish to classify set of texts using it. I am confused about TaggedDocument and TaggedLineDocument. 1) What is the difference between two? Is it that TaggedLineDocument is collection of TaggedDocuments? 2) If I have a…

nlp gensim word2vec text-classification doc2vec

asked Jul 11 '17 at 23:34

dfault

votes

1 answer

Specifying the # of hidden units in Facebook fasttext

In the paper on fasttext for supervised classification, the authors specified various quantities of hidden units by altering some parameter (h is the one on pages 3,4 - In table 1 you see "It has 10 hidden units and we evaluate it with and without…

facebook nlp text-classification fasttext

asked May 22 '17 at 14:33

Adam P.

votes

1 answer

Can I retrain an old model with new data using TensorFlow?

I am new to TensorFlow and I am just trying to see if my idea is even possible. I have trained a model with multi class classifier. Now I can classify a sentence in input, but I would like to change the result of CNN, for example, to improve the…

tensorflow classification text-classification training-data

asked Mar 29 '17 at 09:15

Developer

votes

1 answer

MultinomialNB - Theory vs practice

OK so I'm just studying Andrew Ng's Machine Learning course. I'm currently reading this chapter and want to try the Multinomial Naive Bayes (bottom of page 12) for myself using SKLearn and Python. So Andrew proposes a method, in which each email in…

python machine-learning scikit-learn text-classification multinomial

asked Feb 09 '17 at 16:48

lte__

7,175
25
74
131

Prev 1 2 3

…

99 100 Next