Questions tagged [text-classification]

Simply stating, text classification is all about putting a piece of text into a set of (mostly predefined) categories. This is one of the most important problems which occurs in many real world applications. For example one example of text classification would be an automated call centre which would like to categorise the complaints automatically into the most appropriate bucket of problems.

Text classification is a sub-problem of a more general problem of classification. In this application, the input is represented with a piece of text (rather than images, sounds, videos etc). The output could be:

binary (binary classification)
one category out of k possible categories (multi-class)
a set of categories out of k possible categories (multi-label).

In text classification, the feature extracted from the text are usually sparse (instead of dense, like in image classification).

1694 questions

votes

1 answer

Why won't my neural network train?

I have gathered over 20,000 legal pleadings in PDF format. I am an attorney, but also I write computer programs to help with my practice in MFC/VC++. I'd like to learn to use neural networks (unfortunately my math skills are limited to college…

neural-network text-classification

asked Oct 11 '15 at 23:52

yzcyxisyxuyz

votes

1 answer

Automatic classification items in the store, is it possible?

I've a database of an items in the store. All them are vegetables, fruits, nuts, berries, etc... I need to categorise them. For example different types of potatoes I should group under single group - potato, tomatoes - tomato, etc... The most…

neural-network classification categories text-classification

asked Oct 07 '15 at 18:52

Anatoly

5,056
9
62
136

votes

1 answer

How to combine multiple feature sets in bag of words

I have text classification data with predictions depending on categories, 'descriptions' and 'components'. I could do the classification using bag of words in python with scikit on 'descriptions'. But I want to get predictions using both…

python-2.7 machine-learning scikit-learn text-mining text-classification

asked Sep 30 '15 at 06:41

javi_p

votes

1 answer

How to Unit test a Naive Bayes word classifier?

I wrote a simple Naive Bayes word classifier. In simple term what it does is ... train( "some text A ...", "categoryA" ); train( "some text A ...", "categoryA" ); train( "some text B ...", "categoryB" ); train( "some text B ...", "categoryB"…

unit-testing machine-learning text-classification naivebayes

asked Sep 20 '15 at 09:18

FFMG

1,208
1
10
24

votes

2 answers

An evaluation of text classification method with Reuters-21578 dataset

Please do not block me for this question, i tried to find the answer for about a month and i can not find it and you are my last hope(please if you want to report it at first answer me and then report,thanks). I write an Hybrid text classification…

text dataset evaluation text-classification reuters

asked Sep 18 '15 at 06:24

deansam

votes

0 answers

Suggestions or Ideas on how to join the naive bayes model from training data and test data on hadoop

I built my Naive Bayes Classifier for Text Classification on Java. Now I am trying to port it on hadoop. I have built the model using mappper and reducer and the output is like: label1,word1 count label1,word2 count label1,word3 …

java hadoop text-classification naivebayes

asked Sep 15 '15 at 18:15

Nicky

votes

1 answer

Document Tagging with Named Topics, relevant literature? (Also asked on Quora)

I am working on what is to me a very new domain in data science and would like to know if anyone can suggest any existing academic literature that has relevant approaches that address my problem. The problem setting is as follows: I have a set of…

machine-learning nlp classification tagging text-classification

asked Sep 13 '15 at 17:05

Nikhil

votes

1 answer

Stanford classifier - Why?

Given Stanford Classifier is relatively new which added values it supplies to users of Weka or RapidMiner working on text ML?

weka stanford-nlp rapidminer text-classification

asked Sep 10 '15 at 09:15

user1439579

votes

1 answer

How to create a word map for custom text for text classification in R?

I am trying to implement a text classification program in R that classifies input text (args) into 3 different classes. I have successfully tested the sample program by dividing the input data into training and test data. I would now like to build…

r tm knn text-classification

asked Sep 02 '15 at 20:22

Ankit Sharma

votes

1 answer

Adding custom features to Stanford NER without touching the source code

I have added custom features to my Stanford NER model as suggested in the following link: Stanford-NER customization to classify software programming keywords I was wondering, Is there any better approach at this movement to add custom features…

java nlp stanford-nlp text-classification

asked Sep 02 '15 at 10:13

Rohan Surdikar

votes

1 answer

Classify pdf files upon their name

I have list of pdf files (their names) like Financial_Statement_Q1_2015_En belongs to Quarterly Report. Financial_Statement_Yealy_2015 belongs to Not Quarterly Report. I need to classify names of pdf's upon Quarterly and Not Quarterly…

weka text-classification

asked Aug 28 '15 at 14:20

Raja

votes

0 answers

Single label train set to produce a multilabel output scikit-learn one vs rest

I was wondering whether it is possible, to use a single label train-set to produce a multilabel output. Using the modified the scikit learn example below. The train set contains a number of sentences, either labelled London or NY. At the moment,…

python scikit-learn text-classification multilabel-classification

asked Aug 14 '15 at 09:51

ulrich

3,547
5
35
49

votes

1 answer

Get category from text or keywords

I managed so far to cluster and identify "trending topics" from tweets using 3 different approaches (LDA, SVD and k-means) with k=12. The problem now is to give a category to these topics. I used Alchemy API for text categorization. However, I am…

algorithm twitter nlp text-mining text-classification

asked Aug 03 '15 at 16:24

user4658980

votes

2 answers

Random Forest for multi-label classification

I am making an application for multilabel text classification . I've tried different machine learning algorithm. No doubt the SVM with linear kernel gets the best results. I have also tried to sort through the algorithm Radom Forest and the results…

python machine-learning svm random-forest text-classification

asked Jul 04 '15 at 23:30

Blunt

votes

1 answer

How to do online classification in Apache Mahout?

I have a big data set that I use to train a naive classifier using Apache Mahout. I use the classifier to classify a bunch of documents (this is like my test set). The way I classify documents is as follows: I find the normalized tf-idf vectors for…

classification mahout text-classification document-classification

asked Jul 02 '15 at 21:49

HHH

6,085
20
92
164

Prev 1 2 3

…

100