Questions tagged [classification]

In machine learning and statistics, classification is the problem of identifying which of a set of categories a new observation belongs to, on the basis of a training set of data containing observations whose category membership (label) is known.

In machine learning and statistics, classification refers to the problem of predicting category memberships based on a set of pre-labeled examples. It is thus a type of supervised learning.

Some of the most important classification algorithms are support vector machines svm, logistic regression, naive Bayes, random forest random-forest and artificial neural networks neural-network.

When we wish to associate inputs with continuous values in a supervised framework, the problem is instead known as regression. The unsupervised counterpart to classification is known as clustering (or cluster analysis), and involves grouping data into categories based on some measure of inherent similarity.

7859 questions

votes

5 answers

Controlling the threshold in Logistic Regression in Scikit Learn

I am using the LogisticRegression() method in scikit-learn on a highly unbalanced data set. I have even turned the class_weight feature to auto. I know that in Logistic Regression it should be possible to know what is the threshold value for a…

python machine-learning scikit-learn classification logistic-regression

asked Feb 25 '15 at 10:11

London guy

27,522
44
121
179

votes

2 answers

Correlated features and classification accuracy

I'd like to ask everyone a question about how correlated features (variables) affect the classification accuracy of machine learning algorithms. With correlated features I mean a correlation between them and not with the target class (i.e the…

machine-learning classification correlation feature-selection

asked Feb 11 '13 at 14:18

Titus Pullo

3,751
15
45
65

votes

3 answers

What is the difference between OneVsRestClassifier and MultiOutputClassifier in scikit learn?

Can someone please explain (with example maybe) what is the difference between OneVsRestClassifier and MultiOutputClassifier in scikit-learn? I've read documentation and I've understood that we use: OneVsRestClassifier - when we want to do…

python scikit-learn classification multilabel-classification multiclass-classification

asked Mar 15 '17 at 19:54

PeterB

2,234
6
24
43

votes

4 answers

Multilabel-indicator is not supported for confusion matrix

multilabel-indicator is not supported is the error message I get, when trying to run: confusion_matrix(y_test, predictions) y_test is a DataFrame which is of shape: Horse | Dog | Cat 1 0 0 0 1 0 0 1 0 ... ... …

python numpy scikit-learn classification

asked Oct 26 '17 at 12:09

Khaine775

2,715
8
22
51

votes

5 answers

Scikit-learn confusion matrix

I can't figure out if I've setup my binary classification problem correctly. I labeled the positive class 1 and the negative 0. However It is my understanding that by default scikit-learn uses class 0 as the positive class in its confusion matrix…

python machine-learning scikit-learn classification

asked Feb 03 '16 at 13:35

OAK

2,994
9
36
49

votes

5 answers

What is "naive" in a naive Bayes classifier?

What is naive about Naive Bayes?

algorithm classification naivebayes

asked May 16 '12 at 08:32

Peddler

6,045
4
18
22

votes

3 answers

Dealing with unbalanced datasets in Spark MLlib

I'm working on a particular binary classification problem with a highly unbalanced dataset, and I was wondering if anyone has tried to implement specific techniques for dealing with unbalanced datasets (such as SMOTE) in classification problems…

apache-spark machine-learning classification apache-spark-mllib

asked Oct 27 '15 at 16:04

dbakr

votes

3 answers

What is a threshold in a Precision-Recall curve?

I am aware of the concept of Precision as well as the concept of Recall. But I am finding it very hard to understand the idea of a 'threshold' which makes any P-R curve possible. Imagine I have a model to build that predicts the re-occurrence (yes…

machine-learning classification auc precision-recall model-comparison

asked Sep 14 '17 at 17:03

Mr.A

votes

4 answers

Recommended anomaly detection technique for simple, one-dimensional scenario?

I have a scenario where I have several thousand instances of data. The data itself is represented as a single integer value. I want to be able to detect when an instance is an extreme outlier. For example, with the following example data: a = 10 b…

machine-learning classification

asked Feb 20 '10 at 20:05

Grundlefleck

124,925
25
94
111

votes

4 answers

Unbalanced classification using RandomForestClassifier in sklearn

I have a dataset where the classes are unbalanced. The classes are either '1' or '0' where the ratio of class '1':'0' is 5:1. How do you calculate the prediction error for each class and the rebalance weights accordingly in sklearn with Random…

python machine-learning classification scikit-learn random-forest

asked Nov 19 '13 at 21:41

mlo

votes

1 answer

Difference between Objective and feval in xgboost

What is the difference between objective and feval in xgboost in R? I know this is something very fundamental but I am unable to exactly define them/ their purpose. Also, what is a softmax objective, while doing multi class classification?

r classification xgboost objective-function evaluation-function

asked Dec 09 '15 at 11:58

user2393294

votes

1 answer

What are the 15 classifications of types in C++?

During a CppCon2014 conference talk by Walter E. Brown, he states that there are 15 classifications of types in C++ that the standard describes. "15 partitions of the universe of C++ types." "void is one of them." -- Walter E. Brown. What are…

c++ c++11 types classification categories

asked Nov 20 '14 at 06:00

Trevor Hickey

36,288
32
162
271

votes

1 answer

How to engineer features for machine learning

Do you have some advices or reading how to engineer features for a machine learning task? Good input features are important even for a neural network. The chosen features will affect the needed number of hidden neurons and the needed number of…

artificial-intelligence machine-learning neural-network classification pattern-recognition

asked Apr 20 '10 at 10:55

Ivo Danihelka

3,382
3
31
27

votes

2 answers

Are GAN's unsupervised or supervised?

I hear from some sources that Generative adversarial networks are unsupervised ML, but i dont get it. Are Generative adversarial networks not in fact supervised? 1) 2-class case Real-against-Fake Indeed one has to supply training data to the…

machine-learning neural-network classification

asked Jun 08 '17 at 21:22

scrimau

1,325
1
14
27

votes

7 answers

sklearn LogisticRegression and changing the default threshold for classification

I am using LogisticRegression from the sklearn package, and have a quick question about classification. I built a ROC curve for my classifier, and it turns out that the optimal threshold for my training data is around 0.25. I'm assuming that the…

python machine-learning scikit-learn regression classification

asked Jul 14 '15 at 21:12

Chetan Prabhu

Prev 1 2

…

99 100 Next