Questions tagged [text-classification]

Simply stating, text classification is all about putting a piece of text into a set of (mostly predefined) categories. This is one of the most important problems which occurs in many real world applications. For example one example of text classification would be an automated call centre which would like to categorise the complaints automatically into the most appropriate bucket of problems.

Text classification is a sub-problem of a more general problem of classification. In this application, the input is represented with a piece of text (rather than images, sounds, videos etc). The output could be:

  • binary (binary classification)
  • one category out of k possible categories (multi-class)
  • a set of categories out of k possible categories (multi-label).

In text classification, the feature extracted from the text are usually sparse (instead of dense, like in image classification).

1694 questions
0
votes
1 answer

Why do Tensorflow tf.learn classification results vary a lot?

I use the TensorFlow high-level API tf.learn to train and evaluate a DNN classifier for a series of binary text classifications (actually I need multi-label classification but at the moment I check every label separately). My code is very similar to…
0
votes
1 answer

Works LibShortText with other languages too?

LibShortText is an open source tool for short-text classification and analysis. http://www.csie.ntu.edu.tw/~cjlin/libshorttext/ I have tried to figure out if it also works with other languages than english (e.g. german)? But I didn't find a…
NewbieXXL
  • 155
  • 1
  • 1
  • 11
0
votes
1 answer

Deep learning and text analysis / extraction

i am trying to build a model based on deep learning to extract specific text from long sentences. Let's suppose a text of 200 words, and a table where i have my client name and surname. I am trying to build a model to extract from these 200 words…
0
votes
0 answers

Algorithm to determine group membership

I want to organise objects ( books ) into groups ( works ). The data I have to test for membership is title and author. Often the title and author are formatted slightly differently, such as "Firstname Lastname" or "Lastname. Firstname". …
dkam
  • 3,876
  • 2
  • 32
  • 24
0
votes
1 answer

How to reduce topic classification time in textblob naive bayes classifier

I am using pickle to save classified model with bayes theorem, I have saved a file with 2.1 GB after classification with 5600 records. but when i loading that file it is taking nearly 2 minutes but for classifying some text it is taking 5.5 minutes.…
Balaji
  • 43
  • 5
0
votes
2 answers

Encoding data's label for text classification

I am doing a project in clinical text classification. In my corpus ,data are already labelled by code (For examples: 768.2, V13.02, V13.09, 599.0 ...). I already separated text and labels then using word-embedded for text. I am going to feed them…
ngoduyvu
  • 241
  • 4
  • 16
0
votes
1 answer

Classification using SVM

In an attempt to classify text I want to use SVM. I want to classify test data into one of the labels(health/adult) The training & test data are text files I am using python's scikit library. While I was saving the text to txt files I encoded it in…
0
votes
1 answer

Text Classification/Document Classification with Sequence Tagging with Mallet

I have documents arranged in folders as classes called categories. For a new input (such as a question asked), I have to identify its category. What is be the best way to do this using MALLET? I've gone through multiple articles about this, but…
0
votes
1 answer

Restricting output classes in multi-class classification in Tensorflow

I am building a bidirectional LSTM to do multi-class sentence classification. I have in total 13 classes to choose from and I am multiplying the output of my LSTM network to a matrix whose dimensionality is [2*num_hidden_unit,num_classes] and then…
user1718064
  • 475
  • 1
  • 9
  • 23
0
votes
3 answers

How to get the prominent word in a spam - non spam classifier?

Suppose i have a spam-non spam email classifier. If a new email has been classified as a spam mail, how to determine the words in the mail mainly responsible for the classifier to classify it as SPAM. For example, if a mail has the following text…
0
votes
1 answer

Weka POS tagging + tokenization

I'm new to Weka. I am trying to sentimental classify movie reviews. The thing is, I can understand the StringToWord Vector which tokenizes and attributes the word occurrences. I want to add the Parts Of Speech tags also to the attribute vocabulary…
0
votes
0 answers

short text syntactic classification

I am newbie at machine learning and data mining. Here's the problem: I have one input variable currently which is a small text comprises of non-standard nouns and want to classify in target category. I have about 40% of total training data from…
nir
  • 3,743
  • 4
  • 39
  • 63
0
votes
1 answer

Show accuracy for each class in every given test data using sklearn

I have something to ask. I've trained my sklearn Logistic Regression classifier with 10 thousand training data in Python. I have 2 thousand test data and I use accuracy score to show the accuracy and confusion matrix.. but both only show overall…
0
votes
2 answers

How to map a coordinate to a word in word2vec

In word2vec, it's common to map a word in dictionary to a coordinate in an N-dimension space. Is there any way to reverse this process and synthesize a word given any position in the space?
Daniel
  • 1,484
  • 5
  • 24
  • 42
0
votes
1 answer

Text Categorization Python with pre-trained data

how can i associate my tfidf matrix with a category ? for example i have the below data set **ID** **Text** **Category** 1 jake loves me more than john loves me Romance 2 july…
RData
  • 959
  • 1
  • 13
  • 33