Questions tagged [text-classification]

Simply stating, text classification is all about putting a piece of text into a set of (mostly predefined) categories. This is one of the most important problems which occurs in many real world applications. For example one example of text classification would be an automated call centre which would like to categorise the complaints automatically into the most appropriate bucket of problems.

Text classification is a sub-problem of a more general problem of classification. In this application, the input is represented with a piece of text (rather than images, sounds, videos etc). The output could be:

binary (binary classification)
one category out of k possible categories (multi-class)
a set of categories out of k possible categories (multi-label).

In text classification, the feature extracted from the text are usually sparse (instead of dense, like in image classification).

1694 questions

votes

1 answer

building a sklearn text classifier and converting it with coremltools

I want to build a text classifier with sklearn and then convert it to iOS11 machine learning file using coremltools package. I've built three different classifiers with Logistic Regression, Random Forest, and Linear SVC and all of them work fine in…

python scikit-learn text-classification

asked Jun 08 '17 at 12:39

Saeed Esmaili

votes

0 answers

How to classify data basing on n-grams

I have the following dataset which contains of malware categories and their correspondig API calls .API call column contain a string of words. Basing on those strings i need a classifier to be able to classify each category accordingly. Here is the…

python machine-learning scikit-learn text-classification n-gram

asked Jun 07 '17 at 13:12

ninyesiga

votes

1 answer

Hold out sample when loading data in Scikit-Learn with sklearn.datasets.load_files

I'm experimenting with a simple Naive Bayes with Scikit-learn. Essentially, I've got two folders, respectively named Cat A and Cat B, each of which consisting of circa 1,500 text files. I'm loading these files in order to train the classifier like…

python scikit-learn text-classification

asked Jun 02 '17 at 16:21

DanielH

votes

1 answer

How to load separate textual attributes in weka TextDirectoryLoader?

I am using the JAVA API of Weka to classify documents according to different textual features. When using the TextDirectoryLoader class I am able to load a directory with txt files containing some text, transform the text to numerical feature and…

java machine-learning weka text-classification

asked May 28 '17 at 01:40

KLaz

votes

1 answer

Searching for list of terms using Google in order to build a bag-of-words for a particular category

I am having a hard time understanding the process of building a bag-of-words. This will be a multiclass classfication supervised machine learning problem wherein a webpage or a piece of text is assigned to one category from multiple pre-defined…

machine-learning text-classification supervised-learning multiclass-classification

asked May 27 '17 at 11:53

user6753522

votes

1 answer

Sklearn SGDC partial_fit ValueError: classes should include all valid labels that can be in y

loaded already trained SGDC model and tried to again partial_fit with new features set and data. but received ValueError: classes should include all valid labels that can be in y and my class_weights = None and wanted to have each class equal…

python machine-learning scikit-learn svm text-classification

asked May 19 '17 at 17:27

Chetan Kabra

votes

1 answer

Split text files into two groups - unsupervised learning

Imagine, you are a librarian and during time you have classified a bunch of text files (approx 100) with a general ambiguous keyword. Every text file is actually a topic of keyword_meaning1 or a topic of keyword_meaning2. Which unsupervised learning…

text-classification unsupervised-learning

asked May 18 '17 at 18:46

xralf

3,312
45
129
200

votes

1 answer

Combining Word vectors and Scalar Features for classification

I am working on a short sentence classification problem where I get the following information Input Age of the person (1-100) Gender of the person (Male or Female) Content of the sentence Output Label (Type of Content) To model the sentences I'm…

machine-learning tensorflow word2vec text-classification feature-selection

asked May 07 '17 at 19:20

chaithu

votes

1 answer

Word2vec classification and clustering tensorflow

I am trying to cluster some sentences using similarity (maybe cosine) and then maybe use a classifier to put text in predefined classes. My idea is to use tensorflow to generate the word embedding then average them for each sentence. Next use a…

tensorflow nlp word2vec text-classification

asked May 04 '17 at 14:17

LonsomeHell

votes

1 answer

compare text in object javascript

I want so set data in an object with 2 label positive and negative and I want to set word into the object. I tried this code: function cok(_class, doc) { var vocab = { po: { wd: "good job" }, ne: { …

javascript node.js classification text-classification naivebayes

asked May 02 '17 at 09:37

user7157681

votes

5 answers

multi-label text classification with zero or more labels

I need to classify website text with zero or more categories/labels (5 labels such as finance, tech, etc). My problem is handling text that isn't one of these labels. I tried ML libraries (maxent, naive bayes), but they match "other" text…

machine-learning text-classification multilabel-classification

asked Apr 21 '17 at 18:06

cherpa123

votes

1 answer

SVM value error text classification

I've gone through Scikit-SVM tutorial, and written the code to train and test. But I'm facing an issue with prediction, where it says, 'shape should be equal to training shape'. Here is the code below. EDIT1: Sample Data ERROR_DESC …

python scikit-learn text-classification

asked Apr 20 '17 at 21:41

user6083088

1,047
1
9
27

votes

0 answers

Text Classification for Python- Nonetype Error

I am working on a basic project with Python regarding Text Classification. I am using nltk, and I have imported its Brown Corpus. While trying to classify one group as "positive" and the other group as "negative", I am getting a nonetype error. This…

python nltk text-mining text-classification corpus

asked Apr 12 '17 at 23:22

Elizabeth

votes

2 answers

How to remove HTML, Urls from with Python

I have this list of xml files. Now I have to filter some labels out of it. The problem is the text, there is a lot of html mark up and urls in it and I need plain text. I would like to remove this elements in a loop and then append the cleaned text…

python html regex xml text-classification

asked Apr 12 '17 at 15:02

Bambi

Prev 1 2 3

…

99 100 Next