Questions tagged [text-classification]

Simply stating, text classification is all about putting a piece of text into a set of (mostly predefined) categories. This is one of the most important problems which occurs in many real world applications. For example one example of text classification would be an automated call centre which would like to categorise the complaints automatically into the most appropriate bucket of problems.

Text classification is a sub-problem of a more general problem of classification. In this application, the input is represented with a piece of text (rather than images, sounds, videos etc). The output could be:

  • binary (binary classification)
  • one category out of k possible categories (multi-class)
  • a set of categories out of k possible categories (multi-label).

In text classification, the feature extracted from the text are usually sparse (instead of dense, like in image classification).

1694 questions
0
votes
1 answer

naive Bayes classification error 'non-numeric argument to mathematical function'

Update I have a problem setting up my text classification using naive bayes. First I have 3 text files, two templates with good/bad words, one testing file. My TermDocumentMatrix is created and I also have a vector of rating, according my previous…
wolf_wue
  • 296
  • 1
  • 15
0
votes
2 answers

Dealing with differences in feature space regarding text classification using SVM

I asked this questions on the R mailing list, but I think here is a better place to look for answers and tips. I'm currently working on text classification of student's essays, trying to identify texts that fit to a certain class or not. I use…
PsyR
  • 21
  • 6
0
votes
0 answers

Python confusion matrix analysis in Multinomial Naive Bayes - Scikit Learn

I am solving one document classification problem with Python Scikit learn. I have used CountVectorizer to get word counts from the text documents. And used MultinomialNB classifier for class predictions. My model is giving 94.5% accuracy. I am still…
0
votes
1 answer

Text classification without machine learning

I would like to match social media posts (short text) to a database of movies/TV shows. The database contains information on movie or TV show names, characters and actors. If enough evidence is found in the input text, then I want the algorithm to…
humma4
  • 11
  • 2
0
votes
2 answers

With open() statement with Naive Bayes Classifier takes to long

I have a csv file with 3483 lines and 460K characters and 65K words, and I'm trying to use this corpus to train a NaiveBayes classifier in Scikit-learn. The problem is when I use this statement below, takes too long (1 hour and did not…
0
votes
0 answers

Reuters dataset classes

I am researching on text classification using SVM. I am using reuters 21578 modapte dataset in arff format and classifying it using weka. I am getting two classes after classification viz., (-inf-0.5] and (0.5-inf). What are these classes? And how…
0
votes
2 answers

Text Classification - Label Pre Process

I have a data set of 1M+ observations of customer interactions with a call center. The text is free text written by the representative taking the call. The text is not well formatted nor is it close to being grammatically correct (a lot of short…
meb33
  • 31
  • 1
  • 5
0
votes
1 answer

suggest list of how-to articles based on text content

I have 20,000 messages (combination of email and live chat) between my customer and my support staff. I also have a knowledge base for my product. Often times, the questions customers ask are quite simple and my support staff simply point them to…
0
votes
0 answers

Determine category for a given URL

I would like to determine for a given URL as an input a category from a list of categories (e.g. programming, health, vegan food, computer science, math). cats = [ "programming", "health", "raw vegan food", "vegan cooking", "computer science",…
xralf
  • 3,312
  • 45
  • 129
  • 200
0
votes
0 answers

Classifier Accuracy - Too good to believe

Problem Statement - Classify a product review classes - Travel,Hotel,Cars,Electronics,Food,Movies I am approaching this problem with the famous Text Classification problem. Feature set is prepared by using Doc2Vec default model from gensim and for…
Rashmi Singh
  • 519
  • 1
  • 8
  • 20
0
votes
1 answer

Predicting from SciKitLearn RandomForestClassification with Categorical Data

I created a RandomForestClassification model using SkLearn using 10 different text features and a training set of 10000. Then, I pickled the model (76mb) in hopes of using it for prediction. However, in order to produce the Random Forest, I used…
0
votes
1 answer

Cross Validation classification error

I am using the following code to get the classification results: folds = 5 #number of folds for the cv #Logistic Regression-- clf = linear_model.LogisticRegression(penalty='l1') kf = KFold (len(clas), n_folds=folds) …
0
votes
1 answer

What is the formal process of cleaning unstructured data

I needed help with a couple of things.. I am new to NLP and unstructured data cleaning.. can someone answer the following questions... Thanks need help with regex to identify words like _male and female_ or more generic like _word and word_ or…
Karan Kothari
  • 91
  • 2
  • 12
0
votes
1 answer

Error in mx.sym.Reshape() from http://mxnet.io/tutorials/nlp/cnn.html

I'm trying to follow the Text Classification Tutorial on http://mxnet.io/tutorials/nlp/cnn.html Until I call the function: conv_input = mx.sym.Reshape(data=embed_layer, target_shape=(batch_size, 1, sentence_size, num_embed)) everything goes well.…
hag o hi
  • 117
  • 1
  • 1
  • 9
0
votes
1 answer

Decision Tree nltk

I am trying different learning methods (Decision Tree, NaiveBayes, MaxEnt) to compare their relative performance to get to know the best method among them. How to implement the Decision Tree and get its accuracy? import string from sklearn.tree…
user7269547