Questions tagged [text-classification]

Simply stating, text classification is all about putting a piece of text into a set of (mostly predefined) categories. This is one of the most important problems which occurs in many real world applications. For example one example of text classification would be an automated call centre which would like to categorise the complaints automatically into the most appropriate bucket of problems.

Text classification is a sub-problem of a more general problem of classification. In this application, the input is represented with a piece of text (rather than images, sounds, videos etc). The output could be:

  • binary (binary classification)
  • one category out of k possible categories (multi-class)
  • a set of categories out of k possible categories (multi-label).

In text classification, the feature extracted from the text are usually sparse (instead of dense, like in image classification).

1694 questions
0
votes
1 answer

building a sklearn text classifier and converting it with coremltools

I want to build a text classifier with sklearn and then convert it to iOS11 machine learning file using coremltools package. I've built three different classifiers with Logistic Regression, Random Forest, and Linear SVC and all of them work fine in…
Saeed Esmaili
  • 764
  • 3
  • 12
  • 34
0
votes
0 answers

How to classify data basing on n-grams

I have the following dataset which contains of malware categories and their correspondig API calls .API call column contain a string of words. Basing on those strings i need a classifier to be able to classify each category accordingly. Here is the…
0
votes
1 answer

Hold out sample when loading data in Scikit-Learn with sklearn.datasets.load_files

I'm experimenting with a simple Naive Bayes with Scikit-learn. Essentially, I've got two folders, respectively named Cat A and Cat B, each of which consisting of circa 1,500 text files. I'm loading these files in order to train the classifier like…
DanielH
  • 176
  • 1
  • 16
0
votes
1 answer

Top m topics in a collection of comments

I have a collection of comments and each comment discusses a topic. I want to figure out the top m topics discussed in these comments. Also, I am receiving these comments in an online fashion(i.e. I don't get the entire comments in one go, instead I…
0
votes
1 answer

How to load separate textual attributes in weka TextDirectoryLoader?

I am using the JAVA API of Weka to classify documents according to different textual features. When using the TextDirectoryLoader class I am able to load a directory with txt files containing some text, transform the text to numerical feature and…
KLaz
  • 446
  • 3
  • 11
0
votes
1 answer

Searching for list of terms using Google in order to build a bag-of-words for a particular category

I am having a hard time understanding the process of building a bag-of-words. This will be a multiclass classfication supervised machine learning problem wherein a webpage or a piece of text is assigned to one category from multiple pre-defined…
0
votes
1 answer

Sklearn SGDC partial_fit ValueError: classes should include all valid labels that can be in y

loaded already trained SGDC model and tried to again partial_fit with new features set and data. but received ValueError: classes should include all valid labels that can be in y and my class_weights = None and wanted to have each class equal…
0
votes
1 answer

Split text files into two groups - unsupervised learning

Imagine, you are a librarian and during time you have classified a bunch of text files (approx 100) with a general ambiguous keyword. Every text file is actually a topic of keyword_meaning1 or a topic of keyword_meaning2. Which unsupervised learning…
xralf
  • 3,312
  • 45
  • 129
  • 200
0
votes
1 answer

Combining Word vectors and Scalar Features for classification

I am working on a short sentence classification problem where I get the following information Input Age of the person (1-100) Gender of the person (Male or Female) Content of the sentence Output Label (Type of Content) To model the sentences I'm…
0
votes
1 answer

Word2vec classification and clustering tensorflow

I am trying to cluster some sentences using similarity (maybe cosine) and then maybe use a classifier to put text in predefined classes. My idea is to use tensorflow to generate the word embedding then average them for each sentence. Next use a…
LonsomeHell
  • 573
  • 1
  • 7
  • 29
0
votes
1 answer

compare text in object javascript

I want so set data in an object with 2 label positive and negative and I want to set word into the object. I tried this code: function cok(_class, doc) { var vocab = { po: { wd: "good job" }, ne: { …
user7157681
0
votes
5 answers

multi-label text classification with zero or more labels

I need to classify website text with zero or more categories/labels (5 labels such as finance, tech, etc). My problem is handling text that isn't one of these labels. I tried ML libraries (maxent, naive bayes), but they match "other" text…
0
votes
1 answer

SVM value error text classification

I've gone through Scikit-SVM tutorial, and written the code to train and test. But I'm facing an issue with prediction, where it says, 'shape should be equal to training shape'. Here is the code below. EDIT1: Sample Data ERROR_DESC …
user6083088
  • 1,047
  • 1
  • 9
  • 27
0
votes
0 answers

Text Classification for Python- Nonetype Error

I am working on a basic project with Python regarding Text Classification. I am using nltk, and I have imported its Brown Corpus. While trying to classify one group as "positive" and the other group as "negative", I am getting a nonetype error. This…
Elizabeth
  • 71
  • 2
  • 9
0
votes
2 answers

How to remove HTML, Urls from with Python

I have this list of xml files. Now I have to filter some labels out of it. The problem is the text, there is a lot of html mark up and urls in it and I need plain text. I would like to remove this elements in a loop and then append the cleaned text…
Bambi
  • 715
  • 2
  • 8
  • 19