Questions tagged [text-classification]

Simply stating, text classification is all about putting a piece of text into a set of (mostly predefined) categories. This is one of the most important problems which occurs in many real world applications. For example one example of text classification would be an automated call centre which would like to categorise the complaints automatically into the most appropriate bucket of problems.

Text classification is a sub-problem of a more general problem of classification. In this application, the input is represented with a piece of text (rather than images, sounds, videos etc). The output could be:

  • binary (binary classification)
  • one category out of k possible categories (multi-class)
  • a set of categories out of k possible categories (multi-label).

In text classification, the feature extracted from the text are usually sparse (instead of dense, like in image classification).

1694 questions
0
votes
0 answers

How to split the sparse matrix in Scipy?

I'm using sklearn to classify documents. But I got in trouble splitting the sparse matrix produced by TfidfTransformer which contains the corpus of both the train and the test data. Here is part of my code: vectorizer = CountVectorizer() transformer…
0
votes
1 answer

Different Representation of Full file access paths by malware

I am currently using Dynamic analysis for malware detection. I have list of all the files accessed by malware and benign executable. My aim is to build classifiers on the information extracted through the analysis reports. As of now i am using the…
0
votes
1 answer

How to use hmmlearn to classify English text?

I want to implement a classic Markov model problem: Train MM to learn English text patterns, and use that to detect English text vs. random strings. I decided to use hmmlearn so I don't have to write my own. However I am confused about how to train…
Superbest
  • 25,318
  • 14
  • 62
  • 134
0
votes
1 answer

StringToWordVectore error in java for text classification

1- I try to apply StringToWordVector filter into text by java coding, but it does not work. The output of the filter is incorrect. the code that I used: Instances instances =…
F Arwa
  • 1
  • 2
0
votes
0 answers

Text Classification in R

Hi I have a dataset where a call centre agent types the comments against a client ID. We have to classify these comments into different categories based on common words in them. For e.g. "customer wants refund" or "customer is not happy wants a…
0
votes
1 answer

Pandas dataframe indexes

Lets say I am working with a dataset which has 10 columns. Now, the Label column for my 'Y' is 1. How do I set my X and Y. This is what I have done so far. array = dataframe. values X = array[:,0:2:32] #I know this isn't the right way to do…
someone
  • 149
  • 8
0
votes
0 answers

Maximum Entropy using Stanford Classifier

I am doing Sentiment Analysis of twitter text and want to do it using Maximum Entropy and SVM. I looked up Stanford Classifier but cannot find its implementation in Java. Can anyone guuide from where to start?
0
votes
2 answers

Text recognition and detection using TensorFlow

I a working on a text recognition project. I have built a classifier using TensorFlow to predict digits but I would like to implement a more complex algorithm of text recognition by using text localization and text segmentation (separating each…
0
votes
1 answer

What are some good resources for multi-class text classification using word2vec followed by SVM/ANN / Deep Networks?

I need to implement a multi-class text classifier. I thought of using word2vec, can someone lead me to good papers/resources which talk about this. i would have 4-5 classes and I have loads of data. I have to manually label some of them. It would…
susheel
  • 25
  • 1
  • 5
0
votes
0 answers

How to classify these sentences as positive OR negative?

I have a list of comments made by executives. They are never the same (very unlikely). They indicate the overall sentiment of the company's performance. My objective is to use the past comments to train a classifier and sort the future comments as…
0
votes
0 answers

How to use HOG (Histogram of Gradient) with K-mean clustering for clustering the Text images?

I have used this code of HOG (Histogram of gradient) for clustering the text-images. I have got the row vector 1x1440 . How can i pass it into k mean clustering algorithm for clustering. img = imread('1.jpg'); [featureVector,hogVisualization] =…
0
votes
1 answer

How to de-identify specific words in the text by list using apache spark?

I want to identify those sentences that have some specific words. as you will see in my code i have defined some terms and sentences. I want to print all those sentences that have these defined terms. ****Here is my code:**** import…
0
votes
1 answer

How to classify text with an estimator?

I trained estimator with this: def train_estimator(feature_list, expected_values, k=5): pipeline = Pipeline([('vect', CountVectorizer(input='filename', stop_words='english')), ('clf', MultinomialNB())]) parameters =…
Jay
  • 9,314
  • 7
  • 33
  • 40
0
votes
0 answers

Text classification: Naïve Bayes classifier with skewed data distribution

I have a question about Naïve Bayes classifier with skewed data distribution for training and test data. training data has 90% spam and 10% non-spam test data has 80% non- spam and 20% spam Would it be better to use MLE(max. likelihood) than…
Shraddha
  • 21
  • 4
0
votes
1 answer

How to link 10-fcv weka predicted result back to original comment for text classification

Is there anyway I can route back my predicted result to original comment after text classification using 10-fold cross validation? From the result of 2000 comments of class non-sarc and sarc:…
Suhairi Suhaimin
  • 143
  • 3
  • 13