Questions tagged [text-classification]

Simply stating, text classification is all about putting a piece of text into a set of (mostly predefined) categories. This is one of the most important problems which occurs in many real world applications. For example one example of text classification would be an automated call centre which would like to categorise the complaints automatically into the most appropriate bucket of problems.

Text classification is a sub-problem of a more general problem of classification. In this application, the input is represented with a piece of text (rather than images, sounds, videos etc). The output could be:

  • binary (binary classification)
  • one category out of k possible categories (multi-class)
  • a set of categories out of k possible categories (multi-label).

In text classification, the feature extracted from the text are usually sparse (instead of dense, like in image classification).

1694 questions
0
votes
1 answer

Classifying words inside a document

The problem that I'm facing is: I want to read a document, get the raw string of this document, and classify the information. For example, I want to identify when the string is a "Name", or a "date" ou some other useful information. Is it possible…
Eduardo Briguenti Vieira
  • 4,351
  • 3
  • 37
  • 49
0
votes
0 answers

Naive Bayes Classification: Understanding example correctly?

I am currently looking into the multinomial model for Naive Bayes classification, and have come across the following example: I think I understand everything, but I have developed the following reasoning I would like confirmed: For a given class…
0
votes
1 answer

Classification of sparse data

I am struggling with the best choice for a classification/prediction problem. Let me explain the task - I have a database of keywords from abstracts for different research papers, also I have a list of journals with specified impact factors. I want…
0
votes
1 answer

Should I use word2vec to do word embedding including testing data?

I am a new people in NLP and I am try do the text classification job. Before doing the job, I know that we should do word embedding. My question is should I do word embedding job only on training data (so that testing data get vector just from…
Nils Cao
  • 1,409
  • 2
  • 15
  • 23
0
votes
2 answers

Should I remove stopwords when feed sentence to RNN

In bag-of-words model, I know we should remove stopwords and punctuation before training. But in RNN model, if I want to do text classification, should I remove stopwords too ?
0
votes
1 answer

Weka Classification Project Using StringToWordVector and SMO

I am working on a project in which I have about 18 classes with about 4,000 total instances. I have 7 attributes, 1 being string data, the rest nominal. I am currently using StringToWordVector on the string attribute with Platt's SMO classifier,…
0
votes
2 answers

Scikit-learn: How to extract features from the text?

Assume I have an array of Strings: ['Laptop Apple Macbook Air A1465, Core i7, 8Gb, 256Gb SSD, 15"Retina, MacOS' ... 'another device description'] I'd like to extract from this description features like: item=Laptop brand=Apple model=Macbook Air…
Novitoll
  • 820
  • 1
  • 9
  • 22
0
votes
1 answer

Naive Bayes unseen features handling scikit learn

I am classifying small texts (tweets) using Naive Bayes (MultinominalNB) in scikit-learn. My train data has 1000 features, and my test data has 1200 features. Let's say 500 features are common for both train and test data. I wonder why…
0
votes
1 answer

Text-Classification: Bag of words with MinMax-Scaler

I try classify documents based on their bag of words representation (Features: 1000). For the classification, I am using a SVM, it seems that sometimes the SVM doesn't terminate and runs endlessly. (Running sci-kit: SVC(C=1.0,kernel='linear',…
0
votes
1 answer

Problems with Naive Bayes

I'm trying to run Naive Bayes in R for making predictions from textual data (by building a Document Term Matrix). I read several posts warning about terms that could be missing in both the training and the testing set, so I decided to work with only…
JorgeF
  • 13
  • 5
0
votes
1 answer

GATE machine learning doesn't work

I want to use batch learning PR to conduct text classification in GATE. I firstly write this configure XML and it can work.
Fan Yang
  • 23
  • 3
0
votes
0 answers

Naïve Bayes Algorithm Always comes out as 0

I have a question regarding the Naïve Bayes classification method. I ran though what I thought was an easy example but ran into a snag. Basically here is the classification I would like to do: I want to be able to take some training data: input1 |…
0
votes
2 answers

I need to perform naive bayes text classification. Getting error while running the naiveBayes() method

I am getting an error while using naiveBayes() method in R. I am passing the the as.matrix(train_matrix)as first parameter and as.factor(train_data$subcategory) to the naiveBayes function. I am getting below error : model <-…
Madhav pandey
  • 351
  • 1
  • 4
  • 12
0
votes
3 answers

Machine learning text classification where a text belongs to 1 to N classes

So I am trying to (just for fun) classify movies based on their description, the idea is to "tag" movies, so a given movie might be "action" and "humor" at the same time for example. Normally when using a text classifier, what you get is the class…
0
votes
1 answer

How GATE processes machine learning (text classification)?

Taking the following sentence as an exmaple (gotten from GATE official tutorial slide:module 11 https://gate.ac.uk/sale/talks/gate-course-may10/track-3/module-11-ml-adv/): I was told the item was in stock and next day delivery. After a couple of…
Fan Yang
  • 23
  • 3