Questions tagged [classification]

In machine learning and statistics, classification is the problem of identifying which of a set of categories a new observation belongs to, on the basis of a training set of data containing observations whose category membership (label) is known.

In machine learning and statistics, classification refers to the problem of predicting category memberships based on a set of pre-labeled examples. It is thus a type of supervised learning.

Some of the most important classification algorithms are support vector machines , logistic regression, naive Bayes, random forest and artificial neural networks .

When we wish to associate inputs with continuous values in a supervised framework, the problem is instead known as . The unsupervised counterpart to classification is known as (or cluster analysis), and involves grouping data into categories based on some measure of inherent similarity.

7859 questions
32
votes
1 answer

How to compute error rate from a decision tree?

Does anyone know how to calculate the error rate for a decision tree with R? I am using the rpart() function.
teo6389
  • 523
  • 1
  • 5
  • 10
32
votes
5 answers

Precision/recall for multiclass-multilabel classification

I'm wondering how to calculate precision and recall measures for multiclass multilabel classification, i.e. classification where there are more than two labels, and where each instance can have multiple labels?
32
votes
4 answers

General approach to developing an image classification algorithm for Dilbert cartoons

As a self-development exercise, I want to develop a simple classification algorithm that, given a particular cell of a Dilbert cartoon, is able to identify which characters are present in the cartoon (Dilbert, PHB, Ratbert etc.). I assume the best…
32
votes
6 answers

how to implement tensorflow's next_batch for own data

In the tensorflow MNIST tutorial the mnist.train.next_batch(100) function comes very handy. I am now trying to implement a simple classification myself. I have my training data in a numpy array. How could I implement a similar function for my own…
timbmg
  • 3,192
  • 7
  • 34
  • 52
31
votes
3 answers

Predicting how long an scikit-learn classification will take to run

Is there a way to predict how long it will take to run a classifier from sci-kit learn based on the parameters and dataset? I know, pretty meta, right? Some classifiers/parameter combinations are quite fast, and some take so long that I eventually…
ntaggart
  • 577
  • 1
  • 5
  • 11
30
votes
3 answers

What is the difference between sample weight and class weight options in scikit learn?

I have class imbalance problem and want to solve this using cost sensitive learning. under sample and over sample give weights to class to use a modified loss function Question Scikit learn has 2 options called class weights and sample…
WonderWomen
  • 423
  • 1
  • 4
  • 7
29
votes
4 answers

What is weakly supervised learning (bootstrapping)?

I understand the differences between supervised and unsupervised learning: Supervised Learning is a way of "teaching" the classifier, using labeled data. Unsupervised Learning lets the classifier "learn by itself", for example, using clustering. But…
Cheshie
  • 2,777
  • 6
  • 32
  • 51
28
votes
8 answers

Easy way of counting precision, recall and F1-score in R

I am using an rpart classifier in R. The question is - I would want to test the trained classifier on a test data. This is fine - I can use the predict.rpart function. But I also want to calculate precision, recall and F1 score. My question is - do…
Karel Bílek
  • 36,467
  • 31
  • 94
  • 149
28
votes
4 answers

Compute class weight function issue in 'sklearn' library when used in 'Keras' classification (Python 3.8, only in VS code)

The classifier script I wrote is working fine and recently added weight balancing to the fitting. Since I added the weight estimate function using 'sklearn' library I get the following error : compute_class_weight() takes 1 positional argument but 3…
PCG
  • 2,049
  • 5
  • 24
  • 42
28
votes
3 answers

Getting the accuracy for multi-label prediction in scikit-learn

In a multilabel classification setting, sklearn.metrics.accuracy_score only computes the subset accuracy (3): i.e. the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true. This way of computing the…
Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501
28
votes
3 answers

adding words to stop_words list in TfidfVectorizer in sklearn

I want to add a few more words to stop_words in TfidfVectorizer. I followed the solution in Adding words to scikit-learn's CountVectorizer's stop list . My stop word list now contains both 'english' stop words and the stop words I specified. But…
ac11
  • 927
  • 2
  • 11
  • 18
28
votes
1 answer

BaseEstimator in sklearn.base (Python)

I've been learning and practicing sklearn library on my own. When I participated Kaggle competitions, I noticed the provided sample code used BaseEstimator from sklearn.base. I don't quite understand how/why is BaseEstimator used. from sklearn.base…
neghez
  • 715
  • 1
  • 8
  • 15
27
votes
11 answers

Detecting an online poker cheat

It recently emerged on a large poker site that some players were possibly able to see all opponents cards as they played through exploiting a security vulnerability that was discovered. A naïve cheater would win at an incredibly fast rate, and these…
Tom Gullen
  • 61,249
  • 84
  • 283
  • 456
27
votes
1 answer

How to interpret almost perfect accuracy and AUC-ROC but zero f1-score, precision and recall

I am training ML logistic classifier to classify two classes using python scikit-learn. They are in an extremely imbalanced data (about 14300:1). I'm getting almost 100% accuracy and ROC-AUC, but 0% in precision, recall, and f1 score. I understand…
KubiK888
  • 4,377
  • 14
  • 61
  • 115
26
votes
4 answers

K Nearest-Neighbor Algorithm

Using the KNN-algorithm, say k=5. Now I try to classify an unknown object by getting its 5 nearest neighbours. What to do, if after determining the 4 nearest neighbors, the next 2 (or more) nearest objects have the same distance? Which object of…
Gwaihir
  • 261
  • 3
  • 3