Questions tagged [text-classification]

Simply stating, text classification is all about putting a piece of text into a set of (mostly predefined) categories. This is one of the most important problems which occurs in many real world applications. For example one example of text classification would be an automated call centre which would like to categorise the complaints automatically into the most appropriate bucket of problems.

Text classification is a sub-problem of a more general problem of classification. In this application, the input is represented with a piece of text (rather than images, sounds, videos etc). The output could be:

  • binary (binary classification)
  • one category out of k possible categories (multi-class)
  • a set of categories out of k possible categories (multi-label).

In text classification, the feature extracted from the text are usually sparse (instead of dense, like in image classification).

1694 questions
6
votes
1 answer

How to handle text classification problems when multiple features are involved

I am working on a text classification problem where multiple text features and need to build a model to predict salary range. Please refer the Sample dataset Most of the resources/tutorials deal with feature extraction on only one column and then…
Chetan Ambi
  • 159
  • 3
  • 9
6
votes
2 answers

Sklearn Pipeline ValueError: could not convert string to float

I'm playing around with sklearn and NLP for the first time, and thought I understood everything I was doing up until I didn't know how to fix this error. Here is the relevant code (largely adapted from…
Mike
  • 85
  • 1
  • 9
6
votes
1 answer

How do I determine the binary class predicted by a convolutional neural network on Keras?

I'm building a CNN to perform sentiment analysis on Keras. Everything is working perfectly, the model is trained and ready to be launched to production. However, when I try to predict on new unlabelled data by using the method model.predict() it…
6
votes
1 answer

How I can get the vectors for words that were not present in word2vec vocabulary?

I have check the previous post link but it doesn't seems to work for my case:- I have pre trained word2vec model: import gensim model = Word2Vec.load('w2v_model') Now I have a pandas dataframe with…
James
  • 528
  • 1
  • 6
  • 18
6
votes
2 answers

Which decision_function_shape for sklearn.svm.SVC when using OneVsRestClassifier?

I am doing multi-label classification where I am trying to predict correct tags to questions: (X = questions, y = list of tags for each question from X). I am wondering, which decision_function_shape for sklearn.svm.SVC should be be used with…
6
votes
1 answer

Simple text classification using naive bayes (weka) in java

I try to do text classification naive bayes weka libarary in my java code, but i think the result of the classification is not correct, i don't know what's the problem. I use arff file for the input. this is my training data: @relation…
6
votes
2 answers

Defining vocabulary size in text classification

I have a question regarding the defining of vocabulary set needed for feature extraction in text classification. In an experiment, there are two approaches I can think of: 1.Define vocabulary size using both training data and test data, so that no…
antande
  • 169
  • 1
  • 13
6
votes
1 answer

How to train a naive bayes classifier with pos-tag sequence as a feature?

I have two classes of sentences. Each has reasonably distinct pos-tag sequence. How can I train a Naive-Bayes classifier with POS-Tag sequence as a feature? Does Stanford CoreNLP/NLTK (Java or Python) provide any method for building a classifier…
6
votes
1 answer

How to rank features by their importance in a Weka classifier?

I use Weka to successfully build a classifier. I would now like to evaluate how effective or important my features are. Fot this I use AttributeSelection. But I don't know how to ouput the different features with their corresponding importance. I…
6
votes
2 answers

N-grams vs other classifiers in text categorization

I'm new to text categorization techniques, I want to know the difference between the N-gram approach for text categorization and other classifier (decision tree, KNN, SVM) based text categorization. i want to know which one is better, does n-grams…
6
votes
1 answer

Natural Language Processing - Converting Text Features Into Feature Vectors

So I've been working on a natural language processing project in which I need to classify different styles of writing. Assuming that semantic features from texts have already been extracted for me, I plan to use Weka in Java to train SVM classifiers…
myrocks2
  • 305
  • 3
  • 14
5
votes
1 answer

How to add more features in multi text classification?

I have a retail dataset with product_description, price, supplier, category as columns. I used product_description as feature: from sklearn import model_selection, preprocessing, naive_bayes # split the dataset into training and validation datasets…
Snow
  • 1,058
  • 2
  • 19
  • 47
5
votes
1 answer

One class SVM model for text classification (scikit-learn)

I am attempting to classify a train set of texts to be used for predicting similar texts in the test set of texts. I am using the one_class_svm model. 'author_corpus' contains a list of texts written by a single author and 'test_corpus' contains a…
5
votes
2 answers

How to continue training after loading model on multiple GPUs in Tensorflow 2.0 with Keras API?

I trained a text classification model consisting RNN in Tensorflow 2.0 with Keras API. I trained this model on multiple GPUs(2) using tf.distribute.MirroredStrategy() from here. I saved the checkpoint of the model using…
5
votes
2 answers

Don't understand the HashingVectorizer from sklearn

I'm using HashingVectorizer function from sklearn.feature_extraction.text but I do not understand how it works. My code from sklearn.feature_extraction.text import HashingVectorizer corpus = [ 'This is the first document.', 'This document is the…