Questions tagged [text-classification]

Simply stating, text classification is all about putting a piece of text into a set of (mostly predefined) categories. This is one of the most important problems which occurs in many real world applications. For example one example of text classification would be an automated call centre which would like to categorise the complaints automatically into the most appropriate bucket of problems.

Text classification is a sub-problem of a more general problem of classification. In this application, the input is represented with a piece of text (rather than images, sounds, videos etc). The output could be:

  • binary (binary classification)
  • one category out of k possible categories (multi-class)
  • a set of categories out of k possible categories (multi-label).

In text classification, the feature extracted from the text are usually sparse (instead of dense, like in image classification).

1694 questions
0
votes
1 answer

Incremental/online learning using SGDClassifier partial_fit method

I have build a incremental learning model but not sure whether it is right or wrong i have 2 training data first consist 20000 rows and second consist 10000 rows both of them having two columns description and id...in case of offline learning my…
0
votes
1 answer

architecture: building text suggestions based on existing text

trying build a text suggestions similar to stackoverflow question suggestions. Do not know where to start from. Any suggestions what tools/servers/algorithms i should be researching. Does this come under text classification? Any links towards this…
kumar
  • 8,207
  • 20
  • 85
  • 176
0
votes
1 answer

CNN converges to same accuracy regardless of hyperparameters, what does this indicate?

I have written tensorflow code based on: http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/ but using precomputed word embeddings from the GoogleNews word2vec 300 dimension model. I created my own data from the…
Kevinj22
  • 966
  • 2
  • 7
  • 11
0
votes
0 answers

Classify text based on location/time from an establishing mention

Given a longer English text (> a few paragraphs), is there a rule-based NLP approach to classifying a set of text to be occurring at a place or time, from an establishing phrase? For example: Alice went to London. She met Bob at his hotel, and they…
0
votes
1 answer

Why are Logistic Regression and SVM predictions multiplied by constants at the end?

I'm currently trying to understand certain high-level classification problems and have come across some code from a Kaggle competition that ran in 2012. The competition discussion board are (here) and the winning code is (here). At almost the end of…
salvu
  • 519
  • 5
  • 14
0
votes
1 answer

how to classify input text under different categories

text= "my dog is a rice eater", "I want to buy an a new","my cat prefers chocolate milk" how could I extract keywords from these text (or text corpora) and classify them in different categories (i.e. dog, cat be categorized as Pet and rice,…
mzhasan
  • 55
  • 1
  • 6
0
votes
1 answer

Extract topics from SMS messages

I have a dataset of SMS messages which is ill formatted and sparse. I tried to use topic modeling to get all the possible topics in each message with the probability of each associated topic. I need the probability to be able to arrange or rank each…
0
votes
0 answers

Dealing With Variations In Sparse Matrix

I am working on a text data structuring. I need to make predictions using the following format: xyz@gmail.com -> Email, India -> Country, etc.... To achieve that, SVC along with OneVsRestClassifier is being used. The data extrapolation works just…
0
votes
1 answer

Using feature selection with LinearSVC in python

I have a task to create a multi class classifier for product titles to classify them into 11 categories. I'm using scikit's LinearSVC for classification. I preprocessed the product titles first by removing stopwords, using POS tags for…
0
votes
0 answers

Multiclass text classifcation

I have a question regarding a project in which I have to classify text. In this project I have several thousand questions (strings), which should be put into the categories tech, sports, politics, history, science and geography. My training data…
James No
  • 29
  • 3
0
votes
0 answers

Trying to get more performance on a text classification task

Right now I am working on a text classification (trying to predict if a Twitter response is even human or bot generated). The task is actually a closed kaggle competition, and more details as well as datasets that were used could be found here:…
0
votes
1 answer

Found array with dim 3. Estimator expected <= 2

I am using LDA over a simple collection of documents. my goal is to extract topics, then use the extracted topics as features to evaluate my model. I decided to use multinomial SVM as the evaluater. not sure its good or not? import itertools from…
sariii
  • 2,020
  • 6
  • 29
  • 57
0
votes
2 answers

Handling new features in classification models

I’m taking my first steps in ML, specifically with classifiers for text sentiment analysis. My approach is to make the usual 80% train dataset and 20% test. Having a trained model what is the best way to proceed in a production environment when new…
0
votes
1 answer

Performance: Improve Accuracy of a Naive Bayes Classifier

I am working on a simple Naive Bayes Text Classifier which uses the Brown Corpus for test and training data. So far, I have gotten an accuracy of 53% when using the simple approach without any preprocessing. In order to improve my classifier, I've…
0
votes
1 answer

Feedback in NaiveBayes Text Classification

I am a newbie in machine Learning, i am building a complaint categorizer and i want to provide a feedback model so that it can improve over time import numpy from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import…