Questions tagged [classification]

In machine learning and statistics, classification is the problem of identifying which of a set of categories a new observation belongs to, on the basis of a training set of data containing observations whose category membership (label) is known.

In machine learning and statistics, classification refers to the problem of predicting category memberships based on a set of pre-labeled examples. It is thus a type of supervised learning.

Some of the most important classification algorithms are support vector machines , logistic regression, naive Bayes, random forest and artificial neural networks .

When we wish to associate inputs with continuous values in a supervised framework, the problem is instead known as . The unsupervised counterpart to classification is known as (or cluster analysis), and involves grouping data into categories based on some measure of inherent similarity.

7859 questions
26
votes
4 answers

Difference between logistic regression and softmax regression

I know that logistic regression is for binary classification and softmax regression for multi-class problem. Would it be any differences if I train several logistic regression models with the same data and normalize their results to get a…
26
votes
3 answers

Understanding concept of Gaussian Mixture Models

I'm trying to understand GMM by reading the sources available online. I have achieved clustering using K-Means and was seeing how GMM would compare to K-means. Here is what I have understood, please let me know if my concept is wrong: GMM is like…
26
votes
11 answers

Error in ConfusionMatrix the data and reference factors must have the same number of levels

I've trained a tree model with R caret. I'm now trying to generate a confusion matrix and keep getting the following error: Error in confusionMatrix.default(predictionsTree, testdata$catgeory) : the data and reference factors must have the same…
user2987739
  • 713
  • 2
  • 7
  • 9
26
votes
2 answers

How to apply standardization to SVMs in scikit-learn?

I'm using the current stable version 0.13 of scikit-learn. I'm applying a linear support vector classifier to some data using the class sklearn.svm.LinearSVC. In the chapter about preprocessing in scikit-learn's documentation, I've read the…
pemistahl
  • 9,304
  • 8
  • 45
  • 75
25
votes
2 answers

ValueError: continuous format is not supported

I have written a simple function where I am using the average_precision_score from scikit-learn to compute average precision. My Code: def compute_average_precision(predictions, gold): gold_predictions = np.zeros(predictions.size, dtype=np.int) …
Wasi Ahmad
  • 35,739
  • 32
  • 114
  • 161
25
votes
10 answers

Plotting learning curve in keras gives KeyError: 'val_acc'

I was trying to plot train and test learning curve in keras, however, the following code produces KeyError: 'val_acc error. The official document states that in order to use 'val_acc' I need to enable validation and…
25
votes
3 answers

How to calculate TF*IDF for a single new document to be classified?

I am using document-term vectors to represent a collection of document. I use TF*IDF to calculate the term weight for each document vector. Then I could use this matrix to train a model for document classification. I am looking forward to classify…
25
votes
4 answers

Best way to combine probabilistic classifiers in scikit-learn

I have a logistic regression and a random forest and I'd like to combine them (ensemble) for the final classification probability calculation by taking an average. Is there a built-in way to do this in sci-kit learn? Some way where I can use the…
user1507844
  • 5,973
  • 10
  • 38
  • 55
24
votes
2 answers

How to Interpret Predict Result of SVM in R?

I'm new to R and I'm using the e1071 package for SVM classification in R. I used the following code: data <- loadNumerical() model <- svm(data[,-ncol(data)], data[,ncol(data)], gamma=10) print(predict(model, data[c(1:20),-ncol(data)])) The…
Derrick Zhang
  • 21,201
  • 18
  • 53
  • 73
24
votes
7 answers

Invalid classes inferred from unique values of `y`. Expected: [0 1 2 3 4 5], got [1 2 3 4 5 6]

I've trained dataset using XGB Classifier, but I got this error in local. It worked on Colab and also my friends don't have any problem with same code. I don't know what that error means... Invalid classes inferred from unique values of y. …
ohoh
  • 261
  • 1
  • 2
  • 4
24
votes
4 answers

access to numbers in classification_report - sklearn

This is a simple example of classification_report in sklearn from sklearn.metrics import classification_report y_true = [0, 1, 2, 2, 2] y_pred = [0, 0, 2, 2, 1] target_names = ['class 0', 'class 1', 'class 2'] print(classification_report(y_true,…
Hadij
  • 3,661
  • 5
  • 26
  • 48
24
votes
15 answers

Recognise an arbitrary date string

I need to be able to recognise date strings. It doesn't matter if I can not distinguish between month and date (e.g. 12/12/10), I just need to classify the string as being a date, rather than converting it to a Date object. So, this is really a…
Joel
  • 29,538
  • 35
  • 110
  • 138
23
votes
5 answers

TensorFlow Object Detection API Weird Behavior

I was playing with TensorFlow's brand new Object Detection API and decided to train it on some other publicly available datasets. I happened to stumble upon this grocery dataset which consists of images of various brands of cigarette boxes on the…
23
votes
2 answers

sklearn logistic regression with unbalanced classes

I'm solving a classification problem with sklearn's logistic regression in python. My problem is a general/generic one. I have a dataset with two classes/result (positive/negative or 1/0), but the set is highly unbalanced. There are ~5% positives…
agentscully
  • 231
  • 1
  • 2
  • 3
23
votes
3 answers

KNN classification with categorical data

I'm busy working on a project involving k-nearest neighbor (KNN) classification. I have mixed numerical and categorical fields. The categorical values are ordinal (e.g. bank name, account type). Numerical types are, for e.g. salary and age. There…
Graham
  • 541
  • 2
  • 5
  • 13