Questions tagged [text-classification]

Simply stating, text classification is all about putting a piece of text into a set of (mostly predefined) categories. This is one of the most important problems which occurs in many real world applications. For example one example of text classification would be an automated call centre which would like to categorise the complaints automatically into the most appropriate bucket of problems.

Text classification is a sub-problem of a more general problem of classification. In this application, the input is represented with a piece of text (rather than images, sounds, videos etc). The output could be:

binary (binary classification)
one category out of k possible categories (multi-class)
a set of categories out of k possible categories (multi-label).

In text classification, the feature extracted from the text are usually sparse (instead of dense, like in image classification).

1694 questions

votes

2 answers

Training classifier with large data

I was trying with two class text classification. Usually I created Pickle files of trained model and load those pickle in training phase to eliminate retraining. When I had 12000 review + more then 50000 tweets for each of the class, the training…

asked Mar 17 '16 at 07:00

user123

5,269
16
73
121

votes

0 answers

Predicting the "no class" / unrecognised class in Weka Machine Learning

I am using Weka 3.7 to classify text documents based on their content. I have a set of text files in folders and they all belong to a certain category. Category A: 100 txt files Category B: 100 txt files ... Category X: 100 txt files I want to…

machine-learning classification weka random-forest text-classification

asked Mar 08 '16 at 15:17

Marc Giombetti

votes

1 answer

Save progress between multiple instances of partial_fit in Python SGDClassifier

I've successfully followed this example for my own text classification script. The problem is I'm not looking to process pieces of a huge, but existing data set in a loop of partial_fit calls, like they do in the example. I want to be able to add…

python-3.x machine-learning scikit-learn text-classification

asked Feb 26 '16 at 22:05

Bubba

votes

1 answer

Load and save Weka Model using Java API?

I have my model on my hard drive at d:\MultiNomial.model. That model can be run correctly from weka. The model was built to classify a text using StringToVector as a filter. I am using java to load that model using Weka API. This is my source…

java weka text-classification

asked Feb 20 '16 at 18:54

Lylia John

votes

1 answer

Emotion Classification in Text Using R

I have a enormous data set of texts, from which I have separated the text which holds particular keyword/s. Here is the data set with particular keywords. Now my next task is classify this data set according to 8 emotions and 2 sentiments, in total…

data-mining text-mining sentiment-analysis text-classification emotion

asked Feb 13 '16 at 13:32

user5462317

votes

1 answer

Scikit-learn: precision_recall_fscore_support returns strange results

I am doing some text minining/classification and attempt to evaluate performance with the precision_recall_fscore_support function from the sklearn.metrics module. I am not sure how I can create a really small example reproducing the problem, but…

python machine-learning scikit-learn classification text-classification

asked Feb 05 '16 at 20:19

oarfish

4,116
4
37
66

votes

1 answer

how to combine and feed different features to an algorithm for text classification

Ive got some 120k text files, and 12 categories in which I want to classify these documents into. Im using simple bag of words model and feeding it to NaiveBayes. But I was told that using a mixture of features would "help" OR rather I should…

python nlp classification feature-selection text-classification

asked Jan 11 '16 at 12:46

user4069366

votes

1 answer

R - Automatic categorization of Wikipedia articles

I have been trying to follow this example by Norbert Ryciak, whom I havent been able to get in touch with. Since this article was written in 2014, some things in R have changed so I have been able to update some of those things in the code, but I…

r text-classification

asked Dec 22 '15 at 20:18

tomcontr

votes

1 answer

Compare documents by sequence vector

I'm trying to classify documents by sequence vector. Basically, I have a vocabulary (more than 5000 words). Each document is converted to a vector of integer numbers so that each element in the vector corresponds the position of the word in the…

matlab vector nlp text-classification document-classification

asked Dec 09 '15 at 15:53

lenhhoxung

2,530
2
30
61

votes

1 answer

Letter classificator inaccuracy

I am working on a university project to detect letters from a photo. I can successfully extract words from the photo, cut them into single letters which are black an a white background. These pictures look quite clear. I have trained the SVC…

python text scikit-learn classification text-classification

asked Nov 22 '15 at 16:18

Ghostwriter

2,461
2
16
18

votes

1 answer

Determining the name of a company from a given text

I have a site which is in the stock market domain. The site has a lot of user generated content in terms of forum posts, comments etc. Also, I have a database table that consists of names of all companies (around 5000) listed in the stock…

c# classification text-classification

asked Oct 26 '15 at 07:37

milan m

2,164
3
26
40

votes

1 answer

R caret package (rpart)

I get the below error when using rpart library dt <- rpart(formula, method="class", data=full.df.allAttr.train); Error in model.frame.default(formula = formula, data = full.df.allAttr.train, : object is not a matrix When i convert…

r text-classification rpart

asked Oct 20 '15 at 13:42

user2478236

votes

1 answer

Need help applying scikit-learn to this unbalanced text categorization task

I have a multi-class text classification/categorization problem. I have a set of ground truth data with K different mutually exclusive classes. This is an unbalanced problem in two respects. First, some classes are a lot more frequent than others.…

scikit-learn feature-selection text-classification precision-recall

asked Oct 16 '15 at 13:51

I Z

5,719
19
53
100

votes

1 answer

How to select best parameters for SVM linear kernel type

I perform a classification of two labels using libsvm. But I don't get good results for the default parameters of SVM kernel type = linear. Can any one please tell me a way to find best parameters for SVM linear kernel type

weka svm libsvm text-classification

asked Oct 15 '15 at 08:02

user5232014

votes

1 answer

Naive Bayes with Apache Spark MLlib

I'm using Naive Bayes with Apache Spark MLlib for Text classification follow tutorial: http://avulanov.blogspot.com/2014/08/text-classification-with-apache-spark.html /* instantiate Spark context (not needed for running inside Spark shell */ val sc…

scala apache-spark text-classification naivebayes apache-spark-mllib

asked Oct 13 '15 at 07:22

Thanh Thai Nguyen

Prev 1 2 3

…

100 Next