Questions tagged [text-classification]

Simply stating, text classification is all about putting a piece of text into a set of (mostly predefined) categories. This is one of the most important problems which occurs in many real world applications. For example one example of text classification would be an automated call centre which would like to categorise the complaints automatically into the most appropriate bucket of problems.

Text classification is a sub-problem of a more general problem of classification. In this application, the input is represented with a piece of text (rather than images, sounds, videos etc). The output could be:

binary (binary classification)
one category out of k possible categories (multi-class)
a set of categories out of k possible categories (multi-label).

In text classification, the feature extracted from the text are usually sparse (instead of dense, like in image classification).

1694 questions

votes

1 answer

naive Bayes classification error 'non-numeric argument to mathematical function'

Update I have a problem setting up my text classification using naive bayes. First I have 3 text files, two templates with good/bad words, one testing file. My TermDocumentMatrix is created and I also have a vector of rating, according my previous…

r text-classification naivebayes

asked Feb 20 '17 at 14:58

wolf_wue

votes

2 answers

Dealing with differences in feature space regarding text classification using SVM

I asked this questions on the R mailing list, but I think here is a better place to look for answers and tips. I'm currently working on text classification of student's essays, trying to identify texts that fit to a certain class or not. I use…

r svm text-classification

asked Feb 20 '17 at 08:49

PsyR

votes

0 answers

Python confusion matrix analysis in Multinomial Naive Bayes - Scikit Learn

I am solving one document classification problem with Python Scikit learn. I have used CountVectorizer to get word counts from the text documents. And used MultinomialNB classifier for class predictions. My model is giving 94.5% accuracy. I am still…

python-3.x scikit-learn text-classification naivebayes confusion-matrix

asked Feb 19 '17 at 03:37

Rizwan

votes

1 answer

Text classification without machine learning

I would like to match social media posts (short text) to a database of movies/TV shows. The database contains information on movie or TV show names, characters and actors. If enough evidence is found in the input text, then I want the algorithm to…

database sqlite python-3.x text text-classification

asked Feb 13 '17 at 19:17

humma4

votes

2 answers

With open() statement with Naive Bayes Classifier takes to long

I have a csv file with 3483 lines and 460K characters and 65K words, and I'm trying to use this corpus to train a NaiveBayes classifier in Scikit-learn. The problem is when I use this statement below, takes too long (1 hour and did not…

python machine-learning scikit-learn text-classification naivebayes

asked Feb 12 '17 at 14:20

Flavio

votes

0 answers

Reuters dataset classes

I am researching on text classification using SVM. I am using reuters 21578 modapte dataset in arff format and classifying it using weka. I am getting two classes after classification viz., (-inf-0.5] and (0.5-inf). What are these classes? And how…

svm weka libsvm text-classification reuters

asked Feb 09 '17 at 09:07

Shubham_2901

votes

2 answers

Text Classification - Label Pre Process

I have a data set of 1M+ observations of customer interactions with a call center. The text is free text written by the representative taking the call. The text is not well formatted nor is it close to being grammatically correct (a lot of short…

python r nlp preprocessor text-classification

asked Feb 05 '17 at 05:01

meb33

votes

1 answer

suggest list of how-to articles based on text content

I have 20,000 messages (combination of email and live chat) between my customer and my support staff. I also have a knowledge base for my product. Often times, the questions customers ask are quite simple and my support staff simply point them to…

search machine-learning text-classification azure-cognitive-services

asked Feb 04 '17 at 05:09

apexdodge

6,657
4
26
33

votes

0 answers

Determine category for a given URL

I would like to determine for a given URL as an input a category from a list of categories (e.g. programming, health, vegan food, computer science, math). cats = [ "programming", "health", "raw vegan food", "vegan cooking", "computer science",…

python text-classification

asked Jan 26 '17 at 23:25

xralf

3,312
45
129
200

votes

0 answers

Classifier Accuracy - Too good to believe

Problem Statement - Classify a product review classes - Travel,Hotel,Cars,Electronics,Food,Movies I am approaching this problem with the famous Text Classification problem. Feature set is prepared by using Doc2Vec default model from gensim and for…

python pca gensim text-classification doc2vec

asked Jan 11 '17 at 15:10

Rashmi Singh

votes

1 answer

Predicting from SciKitLearn RandomForestClassification with Categorical Data

I created a RandomForestClassification model using SkLearn using 10 different text features and a training set of 10000. Then, I pickled the model (76mb) in hopes of using it for prediction. However, in order to produce the Random Forest, I used…

python machine-learning scikit-learn random-forest text-classification

asked Jan 05 '17 at 23:01

JV88V

votes

1 answer

Cross Validation classification error

I am using the following code to get the classification results: folds = 5 #number of folds for the cv #Logistic Regression-- clf = linear_model.LogisticRegression(penalty='l1') kf = KFold (len(clas), n_folds=folds) …

python machine-learning scikit-learn cross-validation text-classification

asked Dec 21 '16 at 19:23

Karan Kothari

votes

1 answer

What is the formal process of cleaning unstructured data

I needed help with a couple of things.. I am new to NLP and unstructured data cleaning.. can someone answer the following questions... Thanks need help with regex to identify words like _male and female_ or more generic like _word and word_ or…

python nlp text-classification data-cleaning

asked Dec 21 '16 at 15:12

Karan Kothari

votes

1 answer

Error in mx.sym.Reshape() from http://mxnet.io/tutorials/nlp/cnn.html

I'm trying to follow the Text Classification Tutorial on http://mxnet.io/tutorials/nlp/cnn.html Until I call the function: conv_input = mx.sym.Reshape(data=embed_layer, target_shape=(batch_size, 1, sentence_size, num_embed)) everything goes well.…

python convolution text-classification mxnet

asked Dec 15 '16 at 12:37

hag o hi

votes

1 answer

Decision Tree nltk

I am trying different learning methods (Decision Tree, NaiveBayes, MaxEnt) to compare their relative performance to get to know the best method among them. How to implement the Decision Tree and get its accuracy? import string from sklearn.tree…

python classification decision-tree text-classification maxent

asked Dec 14 '16 at 16:25

user7269547

Prev 1 2 3

…

99 100 Next