Questions tagged [text-classification]

Simply stating, text classification is all about putting a piece of text into a set of (mostly predefined) categories. This is one of the most important problems which occurs in many real world applications. For example one example of text classification would be an automated call centre which would like to categorise the complaints automatically into the most appropriate bucket of problems.

Text classification is a sub-problem of a more general problem of classification. In this application, the input is represented with a piece of text (rather than images, sounds, videos etc). The output could be:

binary (binary classification)
one category out of k possible categories (multi-class)
a set of categories out of k possible categories (multi-label).

In text classification, the feature extracted from the text are usually sparse (instead of dense, like in image classification).

1694 questions

votes

1 answer

How to handle text classification problems when multiple features are involved

I am working on a text classification problem where multiple text features and need to build a model to predict salary range. Please refer the Sample dataset Most of the resources/tutorials deal with feature extraction on only one column and then…

asked Dec 26 '18 at 07:56

Chetan Ambi

votes

2 answers

Sklearn Pipeline ValueError: could not convert string to float

I'm playing around with sklearn and NLP for the first time, and thought I understood everything I was doing up until I didn't know how to fix this error. Here is the relevant code (largely adapted from…

python scikit-learn nlp text-classification

asked Aug 31 '18 at 21:59

Mike

votes

1 answer

How do I determine the binary class predicted by a convolutional neural network on Keras?

I'm building a CNN to perform sentiment analysis on Keras. Everything is working perfectly, the model is trained and ready to be launched to production. However, when I try to predict on new unlabelled data by using the method model.predict() it…

python machine-learning keras deep-learning text-classification

asked Aug 25 '18 at 15:22

RFTexas

votes

1 answer

How I can get the vectors for words that were not present in word2vec vocabulary?

I have check the previous post link but it doesn't seems to work for my case:- I have pre trained word2vec model: import gensim model = Word2Vec.load('w2v_model') Now I have a pandas dataframe with…

python-3.x pandas word2vec gensim text-classification

asked Jul 04 '18 at 07:49

James

votes

2 answers

Which decision_function_shape for sklearn.svm.SVC when using OneVsRestClassifier?

I am doing multi-label classification where I am trying to predict correct tags to questions: (X = questions, y = list of tags for each question from X). I am wondering, which decision_function_shape for sklearn.svm.SVC should be be used with…

python scikit-learn svm text-classification multilabel-classification

asked Apr 19 '17 at 20:26

PeterB

2,234
6
24
43

votes

1 answer

Simple text classification using naive bayes (weka) in java

I try to do text classification naive bayes weka libarary in my java code, but i think the result of the classification is not correct, i don't know what's the problem. I use arff file for the input. this is my training data: @relation…

java weka text-classification naivebayes arff

asked Jan 30 '17 at 11:48

Muhammad Haryadi Futra

votes

2 answers

Defining vocabulary size in text classification

I have a question regarding the defining of vocabulary set needed for feature extraction in text classification. In an experiment, there are two approaches I can think of: 1.Define vocabulary size using both training data and test data, so that no…

machine-learning nlp text-classification

asked Jul 02 '16 at 02:44

antande

votes

1 answer

How to train a naive bayes classifier with pos-tag sequence as a feature?

I have two classes of sentences. Each has reasonably distinct pos-tag sequence. How can I train a Naive-Bayes classifier with POS-Tag sequence as a feature? Does Stanford CoreNLP/NLTK (Java or Python) provide any method for building a classifier…

machine-learning nltk stanford-nlp text-classification naivebayes

asked Feb 27 '15 at 11:50

kundan

1,278
14
27

votes

1 answer

How to rank features by their importance in a Weka classifier?

I use Weka to successfully build a classifier. I would now like to evaluate how effective or important my features are. Fot this I use AttributeSelection. But I don't know how to ouput the different features with their corresponding importance. I…

machine-learning nlp weka feature-selection text-classification

asked Jan 21 '14 at 20:05

khadre

votes

2 answers

N-grams vs other classifiers in text categorization

I'm new to text categorization techniques, I want to know the difference between the N-gram approach for text categorization and other classifier (decision tree, KNN, SVM) based text categorization. i want to know which one is better, does n-grams…

machine-learning data-mining classification n-gram text-classification

asked Dec 01 '13 at 18:54

wudpecker

votes

1 answer

Natural Language Processing - Converting Text Features Into Feature Vectors

So I've been working on a natural language processing project in which I need to classify different styles of writing. Assuming that semantic features from texts have already been extracted for me, I plan to use Weka in Java to train SVM classifiers…

java nlp svm text-classification

asked May 29 '13 at 20:54

myrocks2

votes

1 answer

How to add more features in multi text classification?

I have a retail dataset with product_description, price, supplier, category as columns. I used product_description as feature: from sklearn import model_selection, preprocessing, naive_bayes # split the dataset into training and validation datasets…

python-3.x scikit-learn text-classification supervised-learning

asked Aug 10 '20 at 09:17

Snow

1,058
2
19
47

votes

1 answer

One class SVM model for text classification (scikit-learn)

I am attempting to classify a train set of texts to be used for predicting similar texts in the test set of texts. I am using the one_class_svm model. 'author_corpus' contains a list of texts written by a single author and 'test_corpus' contains a…

python-3.x machine-learning scikit-learn text-classification one-class-classification

asked Feb 29 '20 at 06:00

MythKhan

votes

2 answers

How to continue training after loading model on multiple GPUs in Tensorflow 2.0 with Keras API?

I trained a text classification model consisting RNN in Tensorflow 2.0 with Keras API. I trained this model on multiple GPUs(2) using tf.distribute.MirroredStrategy() from here. I saved the checkpoint of the model using…

tensorflow text-classification tensorflow2.0 multiple-gpu

asked Aug 06 '19 at 12:45

Rishabh Sahrawat

2,437
1
15
32

votes

2 answers

Don't understand the HashingVectorizer from sklearn

I'm using HashingVectorizer function from sklearn.feature_extraction.text but I do not understand how it works. My code from sklearn.feature_extraction.text import HashingVectorizer corpus = [ 'This is the first document.', 'This document is the…

python-3.x scikit-learn nlp vectorization text-classification

asked May 23 '19 at 12:53

Toni Garcia

Prev 1 2 3

…

99 100 Next