Questions tagged [text-classification]

Simply stating, text classification is all about putting a piece of text into a set of (mostly predefined) categories. This is one of the most important problems which occurs in many real world applications. For example one example of text classification would be an automated call centre which would like to categorise the complaints automatically into the most appropriate bucket of problems.

Text classification is a sub-problem of a more general problem of classification. In this application, the input is represented with a piece of text (rather than images, sounds, videos etc). The output could be:

  • binary (binary classification)
  • one category out of k possible categories (multi-class)
  • a set of categories out of k possible categories (multi-label).

In text classification, the feature extracted from the text are usually sparse (instead of dense, like in image classification).

1694 questions
0
votes
1 answer

Combining w2vec and feature selection in pipeline

Based on this article: http://nadbordrozd.github.io/blog/2016/05/20/text-classification-with-word2vec/ I am trying to implement a gensim word2vec model with the pretrained vectors of GloVe in a text classification task. However, I would like to do…
Vas
  • 343
  • 4
  • 18
0
votes
1 answer

Minor fluctuations in accuracy using KFold cross validation

I am testing a multi-label classification problem using textual features. I have a total of 1503 text documents. My model shows slight variations in the results each time I run the script manually. I am not sure if my model overfits or if this is…
0
votes
1 answer

Number of features in text mining

I am trying to make a predictive model based on text mining. I am confused how many features should I set up in my model. I have 1000 document in my analysis (so corpus will take around 700). Number of terms in corpus is around 20 000, so it exceeds…
Arthur G.
  • 93
  • 1
  • 10
0
votes
1 answer

Classification Algorithm for text using R

I wanted to predict class of new document using historical data of text "description" and "class" Below script I am using , but for new document which I want to predict I am not getting better accuracy , can anyone help me to know which algorithm…
user3734568
  • 1,311
  • 2
  • 22
  • 36
0
votes
0 answers

Text Classifier with OpenNLP

I am searching an implementation of a text classifier with OpenNLP in Java. The classes should be user defined, e.g., by learning from a training data set containing sentences with their class. The only project I had found that goes into the right…
0
votes
1 answer

AttributeError: 'NoneType' object has no attribute 'items' for classifier = nltk.NaiveBayesClassifier.train(training_set)

I am getting this error for AttributeError: 'NoneType' object has no attribute 'items. The code is as follows: import nltk import random from nltk.corpus import movie_reviews from nltk.classify import NaiveBayesClassifier from nltk.probability…
Satyam
  • 393
  • 1
  • 5
  • 16
0
votes
1 answer

How to generate labels for scientific texts using a limited data set?

I'm beginning to work on my ML course's project which is to classify a scientific text and label it as if its topic is "A" or not. The problem I'm having is that they have provided me with a limited data set. Usually, scientific texts make use of…
0
votes
0 answers

Test instances always classified in to the the same class

I have written a java program that build a J48 classifier using training set. Then I pass a test instance to the classifier to predict the unknown class. But each and every test instance classified in to first element of the class attribute.…
dennypanther
  • 53
  • 1
  • 10
0
votes
1 answer

New data with R and tm when tf-idf is used

Using R and tm, I have loaded and cleaned up a bunch of text documents, and made them into a Corpus. After that, I built their DTM using tf-idf, and that I can use for all kind of classification-clustering algorithms. So far, so good. Now, let's…
user2345448
  • 159
  • 2
  • 11
0
votes
1 answer

Identifying positivity and negativity separately from a negative dataset

First of all, i would like you to know that I am new to machine learning (ML). I am working on a project which detects how positive or negative a set of words can be, therefore i have created a database containing possible negative words. So that ML…
0
votes
2 answers

supervised tag suggestion for documents

I have thousands of documents with associated tag information. However i also have many documents without tags. I want to train a model on the documents WITH tags and then apply the trained classifier to the UNTAGGED documents; the classifier will…
pwhc
  • 516
  • 1
  • 6
  • 16
0
votes
2 answers

How to use glmnet in R for classfication of multiple classes

I'm trying to figure out how to use glmnet to classify a text. I managed to get it working for two classes using family="binomial" type.measure="auc" I wanted to do the same for multiple-classes using multinomial family. I tried something like…
Khenrix
  • 11
  • 1
  • 7
0
votes
1 answer

Features in KNN in a plane

i'm studying a bit of ML and i got stucked. Suppose i want to do some text classification using k neighbours. I use tfidf vectorizer to create a Matrix term-document where for each Cell is stored the tf-idf value. Now, how can i plot points on the…
rollotommasi
  • 461
  • 1
  • 6
  • 11
0
votes
1 answer

Multiclass text classification: new class if input does not match to a class

I am trying to classify pieces of text to categories. I have 9 categories but the given sentences i have can be classify to more categories. My objective is to take a piece of text and find the industry of each sentence, one common problem i have is…
0
votes
1 answer

Data csv file into different text files with Python

I'm a beginner in programming, but for a Dutch text categorization experiment I want to turn every instance (row) of a csv file into separate .txt files, so that the texts can be analyzed by a NLP tool. My csv looks like this. As you can see, each…
Bambi
  • 715
  • 2
  • 8
  • 19