Questions tagged [text-classification]

Simply stating, text classification is all about putting a piece of text into a set of (mostly predefined) categories. This is one of the most important problems which occurs in many real world applications. For example one example of text classification would be an automated call centre which would like to categorise the complaints automatically into the most appropriate bucket of problems.

Text classification is a sub-problem of a more general problem of classification. In this application, the input is represented with a piece of text (rather than images, sounds, videos etc). The output could be:

  • binary (binary classification)
  • one category out of k possible categories (multi-class)
  • a set of categories out of k possible categories (multi-label).

In text classification, the feature extracted from the text are usually sparse (instead of dense, like in image classification).

1694 questions
0
votes
1 answer

Should tag sentence or whole review for training set?

I am new in analytics field. Maybe this question is silly for you. I am working on a review classification using R. I have to classify review into 50 different categories. I am manually tagging the data for training purpose of Model. I am bit…
Kishore
  • 5,761
  • 5
  • 28
  • 53
0
votes
1 answer

Weka Text Classification MultilayerPerceptron

My goal is to test how well a Multilayer Perceptron classifies the 20 newsgroups data. I keep getting only 5% accuracy with this method but can obtain ~90% with other classification methods such as Naive Bayes and KNN. I'm sure I am doing it wrong,…
GiH
  • 365
  • 4
  • 16
0
votes
0 answers

Why is Naive Bayes Classifier not working? Values too small

Well, I wrote this code to classify my data. My data is of 5000 instances and 260 features. Each feature is binomial, i.e. if word "money" is in the instance that I am categorizing, then feature 23 is 1, otherwise 0 etc. There are 4 categories.…
0
votes
1 answer

StanfordCoreNLP object creation error

I am facing this issue: Exception in thread "main" java.lang.RuntimeException: edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file) Caused by: java.io.InvalidClassException:…
0
votes
1 answer

Building weka classifier

I'm trying to build a classifier in Weka. I have two data sets: training and testing. The two files are identical: with the same number and type of attributes. However, the weka explorer is giving me error saying Train and test set are not…
0
votes
1 answer

Convolutional neural network for multi-classes text classification

I have a CSV file with two columns, 'sentence' is string of sentences, emoID is 1-7 integer such as following: sentence emoID During the period of falling in love. 1 When I was involved in a traffic accident. 2 ..... …
0
votes
1 answer

Can I use SGD with Multinomial Naive Bayes?

I'd like to understand if I can and if it's valid approach to train your MNB model with SGD. My application is text classification. In sklearn I've found out that there is no MNB available, and by default it's SVM, however NB is the linear model,…
0
votes
1 answer

text classificacion: how many dimensions does my data have?

I am classifying text using the bag of words model. I read in 800 text files, each containing a sentence. The sentences are then represented like…
user3813234
  • 1,580
  • 1
  • 29
  • 44
0
votes
1 answer

How can I use Chi-square value for text classification using SVM?

I have both positive and negative training documents for a text classification problem. I am planning on calculating chi-square value for every feature in each document. Having that value, how may I proceed to classification using SVM? What would be…
userAlma
  • 49
  • 6
0
votes
1 answer

Cheapest way to classify HTTP post objects

I can use SciPy to classify text on my machine, but I need to categorize string objects from HTTP POST requests at, or in near, real time. What algorithms should I research if my goals are high concurrency, near real-time output and small memory…
Louisrr
  • 145
  • 1
  • 8
0
votes
1 answer

Use pos tagging in bag of words

I'm using the bag of words for text classification. Results aren't good enough, test set accuracy is below 70%. One of the things I'm considering is to use POS tagging to distinguish the function of words. How is the to go approach to doing it? I'm…
Luis Ramon Ramirez Rodriguez
  • 9,591
  • 27
  • 102
  • 181
0
votes
0 answers

Optimize Convolution Neural Network

I am doing small project using Convolution Neural Network. I code is base on dennybritz. Here my Convect code and here is two function load data and CNN architect. I stuck in the evaluation process, my evaluation is always around 0.42 (The red star…
ngoduyvu
  • 241
  • 4
  • 16
0
votes
1 answer

Python: How to calculate tf-idf for a large data set

I have a following data frame df, which I converted from sframe URI name text 0
0
votes
0 answers

Python scikit learn how to build a model for multi-class and multi-label data?

I have dataset like this: Description attributes.occasion.0 attributes.occasion.1 attributes.occasion.2 attributes.occasion.3 attributes.occasion.4 descr01 Chanukah Christmas Housewarming Just…
0
votes
0 answers

How do I classify unlabelled dataset using weka API for java?

I am currently trying to classify tweets based in sentiment (positive, negative, neutral). I have trained my naive Bayes using a training dataset... NaiveBayes nb = new NaiveBayes(); nb.buildClassifier(trainingData); I have tried labelling my…
T.newGuy1620
  • 15
  • 1
  • 1
  • 6