Questions tagged [classification]

In machine learning and statistics, classification is the problem of identifying which of a set of categories a new observation belongs to, on the basis of a training set of data containing observations whose category membership (label) is known.

In machine learning and statistics, classification refers to the problem of predicting category memberships based on a set of pre-labeled examples. It is thus a type of supervised learning.

Some of the most important classification algorithms are support vector machines , logistic regression, naive Bayes, random forest and artificial neural networks .

When we wish to associate inputs with continuous values in a supervised framework, the problem is instead known as . The unsupervised counterpart to classification is known as (or cluster analysis), and involves grouping data into categories based on some measure of inherent similarity.

7859 questions
41
votes
5 answers

Controlling the threshold in Logistic Regression in Scikit Learn

I am using the LogisticRegression() method in scikit-learn on a highly unbalanced data set. I have even turned the class_weight feature to auto. I know that in Logistic Regression it should be possible to know what is the threshold value for a…
40
votes
2 answers

Correlated features and classification accuracy

I'd like to ask everyone a question about how correlated features (variables) affect the classification accuracy of machine learning algorithms. With correlated features I mean a correlation between them and not with the target class (i.e the…
39
votes
3 answers

What is the difference between OneVsRestClassifier and MultiOutputClassifier in scikit learn?

Can someone please explain (with example maybe) what is the difference between OneVsRestClassifier and MultiOutputClassifier in scikit-learn? I've read documentation and I've understood that we use: OneVsRestClassifier - when we want to do…
38
votes
4 answers

Multilabel-indicator is not supported for confusion matrix

multilabel-indicator is not supported is the error message I get, when trying to run: confusion_matrix(y_test, predictions) y_test is a DataFrame which is of shape: Horse | Dog | Cat 1 0 0 0 1 0 0 1 0 ... ... …
Khaine775
  • 2,715
  • 8
  • 22
  • 51
37
votes
5 answers

Scikit-learn confusion matrix

I can't figure out if I've setup my binary classification problem correctly. I labeled the positive class 1 and the negative 0. However It is my understanding that by default scikit-learn uses class 0 as the positive class in its confusion matrix…
OAK
  • 2,994
  • 9
  • 36
  • 49
37
votes
5 answers

What is "naive" in a naive Bayes classifier?

What is naive about Naive Bayes?
Peddler
  • 6,045
  • 4
  • 18
  • 22
36
votes
3 answers

Dealing with unbalanced datasets in Spark MLlib

I'm working on a particular binary classification problem with a highly unbalanced dataset, and I was wondering if anyone has tried to implement specific techniques for dealing with unbalanced datasets (such as SMOTE) in classification problems…
35
votes
3 answers

What is a threshold in a Precision-Recall curve?

I am aware of the concept of Precision as well as the concept of Recall. But I am finding it very hard to understand the idea of a 'threshold' which makes any P-R curve possible. Imagine I have a model to build that predicts the re-occurrence (yes…
35
votes
4 answers

Recommended anomaly detection technique for simple, one-dimensional scenario?

I have a scenario where I have several thousand instances of data. The data itself is represented as a single integer value. I want to be able to detect when an instance is an extreme outlier. For example, with the following example data: a = 10 b…
Grundlefleck
  • 124,925
  • 25
  • 94
  • 111
35
votes
4 answers

Unbalanced classification using RandomForestClassifier in sklearn

I have a dataset where the classes are unbalanced. The classes are either '1' or '0' where the ratio of class '1':'0' is 5:1. How do you calculate the prediction error for each class and the rebalance weights accordingly in sklearn with Random…
34
votes
1 answer

Difference between Objective and feval in xgboost

What is the difference between objective and feval in xgboost in R? I know this is something very fundamental but I am unable to exactly define them/ their purpose. Also, what is a softmax objective, while doing multi class classification?
34
votes
1 answer

What are the 15 classifications of types in C++?

During a CppCon2014 conference talk by Walter E. Brown, he states that there are 15 classifications of types in C++ that the standard describes. "15 partitions of the universe of C++ types." "void is one of them." -- Walter E. Brown. What are…
Trevor Hickey
  • 36,288
  • 32
  • 162
  • 271
34
votes
1 answer

How to engineer features for machine learning

Do you have some advices or reading how to engineer features for a machine learning task? Good input features are important even for a neural network. The chosen features will affect the needed number of hidden neurons and the needed number of…
33
votes
2 answers

Are GAN's unsupervised or supervised?

I hear from some sources that Generative adversarial networks are unsupervised ML, but i dont get it. Are Generative adversarial networks not in fact supervised? 1) 2-class case Real-against-Fake Indeed one has to supply training data to the…
scrimau
  • 1,325
  • 1
  • 14
  • 27
33
votes
7 answers

sklearn LogisticRegression and changing the default threshold for classification

I am using LogisticRegression from the sklearn package, and have a quick question about classification. I built a ROC curve for my classifier, and it turns out that the optimal threshold for my training data is around 0.25. I'm assuming that the…