Questions tagged [text-classification]

Simply stating, text classification is all about putting a piece of text into a set of (mostly predefined) categories. This is one of the most important problems which occurs in many real world applications. For example one example of text classification would be an automated call centre which would like to categorise the complaints automatically into the most appropriate bucket of problems.

Text classification is a sub-problem of a more general problem of classification. In this application, the input is represented with a piece of text (rather than images, sounds, videos etc). The output could be:

  • binary (binary classification)
  • one category out of k possible categories (multi-class)
  • a set of categories out of k possible categories (multi-label).

In text classification, the feature extracted from the text are usually sparse (instead of dense, like in image classification).

1694 questions
0
votes
1 answer

Why won't my neural network train?

I have gathered over 20,000 legal pleadings in PDF format. I am an attorney, but also I write computer programs to help with my practice in MFC/VC++. I'd like to learn to use neural networks (unfortunately my math skills are limited to college…
0
votes
1 answer

Automatic classification items in the store, is it possible?

I've a database of an items in the store. All them are vegetables, fruits, nuts, berries, etc... I need to categorise them. For example different types of potatoes I should group under single group - potato, tomatoes - tomato, etc... The most…
Anatoly
  • 5,056
  • 9
  • 62
  • 136
0
votes
1 answer

How to combine multiple feature sets in bag of words

I have text classification data with predictions depending on categories, 'descriptions' and 'components'. I could do the classification using bag of words in python with scikit on 'descriptions'. But I want to get predictions using both…
0
votes
1 answer

How to Unit test a Naive Bayes word classifier?

I wrote a simple Naive Bayes word classifier. In simple term what it does is ... train( "some text A ...", "categoryA" ); train( "some text A ...", "categoryA" ); train( "some text B ...", "categoryB" ); train( "some text B ...", "categoryB"…
FFMG
  • 1,208
  • 1
  • 10
  • 24
0
votes
2 answers

An evaluation of text classification method with Reuters-21578 dataset

Please do not block me for this question, i tried to find the answer for about a month and i can not find it and you are my last hope(please if you want to report it at first answer me and then report,thanks). I write an Hybrid text classification…
deansam
  • 68
  • 1
  • 2
  • 8
0
votes
0 answers

Suggestions or Ideas on how to join the naive bayes model from training data and test data on hadoop

I built my Naive Bayes Classifier for Text Classification on Java. Now I am trying to port it on hadoop. I have built the model using mappper and reducer and the output is like: label1,word1 count label1,word2 count label1,word3 …
Nicky
  • 333
  • 2
  • 4
  • 11
0
votes
1 answer

Document Tagging with Named Topics, relevant literature? (Also asked on Quora)

I am working on what is to me a very new domain in data science and would like to know if anyone can suggest any existing academic literature that has relevant approaches that address my problem. The problem setting is as follows: I have a set of…
0
votes
1 answer

Stanford classifier - Why?

Given Stanford Classifier is relatively new which added values it supplies to users of Weka or RapidMiner working on text ML?
user1439579
  • 131
  • 10
0
votes
1 answer

How to create a word map for custom text for text classification in R?

I am trying to implement a text classification program in R that classifies input text (args) into 3 different classes. I have successfully tested the sample program by dividing the input data into training and test data. I would now like to build…
0
votes
1 answer

Adding custom features to Stanford NER without touching the source code

I have added custom features to my Stanford NER model as suggested in the following link: Stanford-NER customization to classify software programming keywords I was wondering, Is there any better approach at this movement to add custom features…
0
votes
1 answer

Classify pdf files upon their name

I have list of pdf files (their names) like Financial_Statement_Q1_2015_En belongs to Quarterly Report. Financial_Statement_Yealy_2015 belongs to Not Quarterly Report. I need to classify names of pdf's upon Quarterly and Not Quarterly…
Raja
  • 33
  • 5
0
votes
0 answers

Single label train set to produce a multilabel output scikit-learn one vs rest

I was wondering whether it is possible, to use a single label train-set to produce a multilabel output. Using the modified the scikit learn example below. The train set contains a number of sentences, either labelled London or NY. At the moment,…
0
votes
1 answer

Get category from text or keywords

I managed so far to cluster and identify "trending topics" from tweets using 3 different approaches (LDA, SVD and k-means) with k=12. The problem now is to give a category to these topics. I used Alchemy API for text categorization. However, I am…
user4658980
0
votes
2 answers

Random Forest for multi-label classification

I am making an application for multilabel text classification . I've tried different machine learning algorithm. No doubt the SVM with linear kernel gets the best results. I have also tried to sort through the algorithm Radom Forest and the results…
Blunt
  • 529
  • 1
  • 8
  • 14
0
votes
1 answer

How to do online classification in Apache Mahout?

I have a big data set that I use to train a naive classifier using Apache Mahout. I use the classifier to classify a bunch of documents (this is like my test set). The way I classify documents is as follows: I find the normalized tf-idf vectors for…
HHH
  • 6,085
  • 20
  • 92
  • 164
1 2 3
99
100