Questions tagged [text-classification]

Simply stating, text classification is all about putting a piece of text into a set of (mostly predefined) categories. This is one of the most important problems which occurs in many real world applications. For example one example of text classification would be an automated call centre which would like to categorise the complaints automatically into the most appropriate bucket of problems.

Text classification is a sub-problem of a more general problem of classification. In this application, the input is represented with a piece of text (rather than images, sounds, videos etc). The output could be:

binary (binary classification)
one category out of k possible categories (multi-class)
a set of categories out of k possible categories (multi-label).

In text classification, the feature extracted from the text are usually sparse (instead of dense, like in image classification).

1694 questions

votes

2 answers

Naive Bayes in Quanteda vs caret: wildly different results

I'm trying to use the packages quanteda and caret together to classify text based on a trained sample. As a test run, I wanted to compare the build-in naive bayes classifier of quanteda with the ones in caret. However, I can't seem to get caret to…

asked Jan 29 '19 at 17:57

JBGruber

11,727
1
23
45

votes

1 answer

How to resample text (imbalanced groups) in a pipeline?

I'm trying to do some text classification using MultinomialNB, but I'm running into problems because my data is unbalanced. (Below is some sample data for simplicity. In actuality, mine is much larger.) I'm trying to resample my data using…

python pipeline text-classification resampling oversampling

asked Jan 09 '19 at 20:45

Kelsey

votes

4 answers

How can a machine learning model handle unseen data and unseen label?

I am trying to solve a text classification problem. I have a limited number of labels that capture the category of my text data. If the incoming text data doesn't fit any label, it is tagged as 'Other'. In the below example, I built a text…

machine-learning scikit-learn nlp text-classification naivebayes

asked Sep 17 '18 at 16:15

Prasanth Regupathy

votes

4 answers

Create ML Text Classifier probabilities

I am creating model with Create ML. I am using a JSON file. let data = try MLDataTable(contentsOf: URL(fileURLWithPath: "poems.json")) let (trainingData , testingData) = data.randomSplit(by: 0.8, seed: 0) let classifier = try…

swift text-classification coreml createml

asked Sep 07 '18 at 14:03

P S

votes

2 answers

LSTM Text Classification Bad Accuracy Keras

I'm going crazy in this project. This is multi-label text-classification with lstm in keras. My model is this: model = Sequential() model.add(Embedding(max_features, embeddings_dim, input_length=max_sent_len, mask_zero=True,…

keras lstm text-classification recurrent-neural-network multilabel-classification

asked Aug 22 '18 at 07:49

angelo curti giardina

votes

2 answers

SMOTE, Oversampling on text classification in Python

I am doing a text classification and I have very imbalanced data like Category | Total Records Cate1 | 950 Cate2 | 40 Cate3 | 10 Now I want to over sample Cate2 and Cate3 so it at least have 400-500 records, I prefer to use SMOTE over…

python machine-learning nlp text-classification resampling

asked Jun 23 '18 at 09:00

Vineet

1,492
4
17
31

votes

1 answer

How can I visualize border/decision function of two classes using scikit-learn

I am pretty new in machine learning, so I still don't understand how I can visualize the border between 2 classes in bag of words case. I found the following exaplpe to plot data plot a document tfidf 2D graph from sklearn.datasets import…

python machine-learning scikit-learn svm text-classification

asked May 12 '18 at 10:25

Alexandr Bazarov

votes

2 answers

How do i build a model using Glove word embeddings and predict on Test data using text2vec in R

I am building a classification model on text data into two categories(i.e. classifying each comment into 2 categories) using GloVe word embeddings. I have two columns, one with textual data(comments) and the other one is a binary Target…

r word2vec text-classification word-embedding text2vec

asked Mar 05 '18 at 22:23

sri sivani charan

votes

1 answer

How to predict desired class using Naive Bayes in Text Classification

I have been implementing Multinomial Naive Bayes Classifier from scratch for text classification in python. I calculate the feature count for each classes and probability distributions for features. According to my implementation I get the…

python machine-learning text-classification naivebayes

asked May 25 '17 at 08:37

Jahangir Alam

votes

2 answers

Best machine learning approach to automate text/fuzzy matching

I'm reasonably new to machine learning, I've done a few projects in python. I'm looking for advice on how to approach the below problem which I believe could be automated. A user in a data quality team in my organisation has a daily task of taking a…

machine-learning text-classification fuzzy-comparison record-linkage

asked Feb 16 '17 at 16:40

Anonymous

1,015
1
10
14

votes

2 answers

RNN for binary classification of sequence

I wondering if someone can suggest a good library or reference (tutorial or article) to implement a Recurrent Neural Network (RNN). I tried to use the rnnlib by Alex Graves, but I had some troubles in changing the architecture to adapt the network…

deep-learning regular-language text-classification recurrent-neural-network

asked Nov 09 '16 at 18:11

G_Zak

votes

1 answer

Adding Special Case Idioms to Python Vader Sentiment

I've been using Vader Sentiment to do some text sentiment analysis and I noticed that my data has a lot of "way to go" phrases that were incorrectly being classified as neutral: In[11]: sentiment('way to go John') Out[11]: {'compound': 0.0, 'neg':…

python sentiment-analysis text-classification

asked Dec 21 '15 at 16:43

Jason

2,834
6
31
35

votes

2 answers

Large classification document corpus

Can anyone point me to some large corpus that I use for classification? But by large I don't mean Reuters or 20 newsgroups, I'm talking about a corpus of GB size, not 20MB or something like that. I was able only to find this Reuters and 20…

dataset classification corpus text-classification

asked Aug 27 '15 at 10:17

Kobe-Wan Kenobi

3,694
2
40
67

votes

1 answer

How to use spark Naive Bayes classifier for text classification with IDF?

I want to convert text documents into feature vectors using tf-idf, and then train a naive bayes algorithm to classify them. I can easily load my text files without the labels and use HashingTF() to convert it into a vector, and then use IDF() to…

python apache-spark tf-idf text-classification apache-spark-mllib

asked Aug 26 '15 at 15:43

zsyp

votes

2 answers

SMOTE oversampling and cross-validation

I am working on a binary classification problem in Weka with a highly imbalanced data set (90% in one category and 10% in the other). I first applied SMOTE (http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a-html/node6.html) to the…

machine-learning weka text-classification

asked Aug 06 '15 at 12:52

kverr

Prev 1 2 3

…

99 100 Next