Questions tagged [text-classification]

Simply stating, text classification is all about putting a piece of text into a set of (mostly predefined) categories. This is one of the most important problems which occurs in many real world applications. For example one example of text classification would be an automated call centre which would like to categorise the complaints automatically into the most appropriate bucket of problems.

Text classification is a sub-problem of a more general problem of classification. In this application, the input is represented with a piece of text (rather than images, sounds, videos etc). The output could be:

binary (binary classification)
one category out of k possible categories (multi-class)
a set of categories out of k possible categories (multi-label).

In text classification, the feature extracted from the text are usually sparse (instead of dense, like in image classification).

1694 questions

votes

2 answers

How Information Gain Works in Text Classification

I have to learn information gain for feature selection right now, But I don't have clear comprehension about it. I am a newbie, and I'm confused about it. How to use IG in feature selection (manual calculation)? I just have clue this .. That have…

text text-classification information-theory

asked Dec 15 '16 at 07:53

Wiwik Setyaningsih

votes

1 answer

scikit-learn classification using doc2vec representation

I want to classify text documents using doc2vec representation and scikit-learn models. My problem is that I'm lost on how to get started. can someone explain the general steps usually taken to use doc2vec with scikit-learn?

machine-learning scikit-learn text-classification doc2vec

asked Nov 27 '16 at 20:19

MikeAlbert

votes

1 answer

R: how to use random forests to predict binary outcome using string variables?

Consider the following dataframe outcome <- c(1,0,0,1,1) string <- c('I love pasta','hello world', '1+1 = 2','pasta madness', 'pizza madness') df = df=data.frame(outcome,string) > df outcome string 1 1 I love pasta 2 0 …

r machine-learning classification random-forest text-classification

asked Oct 21 '16 at 14:10

ℕʘʘḆḽḘ

18,566
34
128
235

votes

3 answers

Text classification using e1071 (SVM)

I have a dataframe having two columns. One Column contains text. Each row of that column one contains some type of data of three different classes(skill,qualification,experience) and other column is their respective class labels. Snapshot of the…

r svm text-classification multilabel-classification

asked Oct 14 '16 at 20:32

user2252882

votes

1 answer

Addressing synonyms in Supervised Learning for Text Classification

I am using scikit-learn supervised learning method for text classification. I have a training dataset with input text fields and the categories they belong to. I use tf-idf, SVM classifier pipeline for creating the model. The solution works well for…

machine-learning scikit-learn text-classification supervised-learning

asked Oct 07 '16 at 05:19

Shamy

votes

3 answers

Setting up a MLP for binary classification with tensorflow

I have some troubles trying to set up a multilayer perceptron for binary classification using tensorflow. I have a very large dataset (about 1,5*10^6 examples) each with a binary (0/1) label and 100 features. What I need to do is to set up a simple…

machine-learning tensorflow deep-learning text-classification

asked Oct 02 '16 at 14:23

Darkobra

votes

1 answer

Using Keras for text classification

I am struggling to approach the bag of words / vocabulary method for representing my input data as one hot vectors for my neural net model in keras. I would like to build a simple 3 layer network but I need help in understanding and developing an…

python nlp keras text-classification

asked Aug 21 '16 at 02:13

Moey Zf

votes

1 answer

Text2Vec classification with caret problems

Some context: Working with text classification and big sparse matrices in R I have been working on a text multi-class classification problem with the text2vec package and caret. The plan is to use text2vec for building the document-term matrix,…

r svm r-caret text-classification text2vec

asked Aug 04 '16 at 13:19

Ed.

votes

1 answer

Issues using scikit to for multi-label data

Im using the following code for Multi-label data classification :- import numpy as np from sklearn.pipeline import Pipeline from sklearn.feature_extraction.text import CountVectorizer from sklearn.svm import LinearSVC from…

python machine-learning text-classification multilabel-classification

asked Dec 28 '15 at 15:49

user4069366

votes

1 answer

How to change data of a corpus to appropriate format for training with 'caret' package in R?

Q-1. How to change data of a corpus to appropriate format for training with 'caret' package? First of all, i would like to give you some environments for this question and i will be show you where i am stuck. Environments This is corpus that is…

r text-mining r-caret text-classification document-classification

asked Dec 15 '15 at 20:29

user5152421

votes

1 answer

Sklearn other inputs in addition to text for text classification

I am trying to do a text classifier using "Sci kit" learn bag of words. Vectorization into a classifier. However, I was wondering how would i add another variable to the input apart from the text itself. Say I want to add a number of words in the…

python scikit-learn classification words text-classification

asked Dec 08 '15 at 17:23

Goodie123

votes

1 answer

TextClassification with TextBlob

I'm a complete newbie in Machine Learning, NLP, Data Analysis but I'm very motivated to understand it better. I'm reading couple of books on NLTK, scikit-learn etc. I discovered a python module "TextBlob" and found it to be super easy to get started…

machine-learning nltk sentiment-analysis text-classification textblob

asked Nov 29 '15 at 07:25

dpnishant

votes

1 answer

"Combine" TF-IDF scores for single class of documents within corpus

Let's say I've calculated the TF-IDF scores for a corpus of documents, resulting in a matrix of TF-IDF features. If a subset of those documents are of a certain class, can I somehow "combine" the scores of that subset to get a single value for each…

machine-learning nlp tf-idf text-classification

asked Sep 02 '15 at 03:16

Andrew LaPrise

3,373
4
32
50

votes

2 answers

Detect (predefined) topics in natural text

Is there a library or database out there that can detect the topics of natural text? I'm not talking about generating topics from extracted keywords, but about analysing the used vocabulary and matching it with predefined topics. Like searching for…

nlp text-classification information-extraction

asked Jun 08 '15 at 14:57

snøreven

1,904
2
19
39

votes

1 answer

How to correctly override and call super-method in Python

First, the problem at hand. I am writing a wrapper for a scikit-learn class, and am having problems with the right syntax. What I am trying to achieve is an override of the fit_transform function, which alters the input only slightly, and then calls…

python scikit-learn classification text-classification

asked Apr 22 '15 at 15:34

Arne

17,706
5
83
99

Prev 1 2 3

…

99 100 Next