Questions tagged [classification]

In machine learning and statistics, classification is the problem of identifying which of a set of categories a new observation belongs to, on the basis of a training set of data containing observations whose category membership (label) is known.

In machine learning and statistics, classification refers to the problem of predicting category memberships based on a set of pre-labeled examples. It is thus a type of supervised learning.

Some of the most important classification algorithms are support vector machines , logistic regression, naive Bayes, random forest and artificial neural networks .

When we wish to associate inputs with continuous values in a supervised framework, the problem is instead known as . The unsupervised counterpart to classification is known as (or cluster analysis), and involves grouping data into categories based on some measure of inherent similarity.

7859 questions
23
votes
5 answers

PCA first or normalization first?

When doing regression or classification, what is the correct (or better) way to preprocess the data? Normalize the data -> PCA -> training PCA -> normalize PCA output -> training Normalize the data -> PCA -> normalize PCA output -> training Which…
AlanS
  • 738
  • 1
  • 6
  • 13
22
votes
2 answers

How to get different Variable Importance for each class in a binary h2o GBM in R?

I'm trying to explore the use of a GBM with h2o for a classification issue to replace a logistic regression (GLM). The non-linearity and interactions in my data make me think a GBM is more suitable. I've ran a baseline GBM (see below) and compared…
wake_wake
  • 1,332
  • 2
  • 19
  • 46
21
votes
3 answers

Multi-layer neural network won't predict negative values

I have implemented a multilayer perceptron to predict the sin of input vectors. The vectors consist of four -1,0,1's chosen at random and a bias set to 1. The network should predict the sin of sum of the vectors contents. eg Input = <0,1,-1,0,1>…
B. Bowles
  • 764
  • 4
  • 9
  • 21
21
votes
2 answers

How to add another feature (length of text) to current bag of words classification? Scikit-learn

I am using bag of words to classify text. It's working well but I am wondering how to add a feature which is not a word. Here is my sample code. import numpy as np from sklearn.pipeline import Pipeline from sklearn.feature_extraction.text import…
21
votes
5 answers

Why use softmax only in the output layer and not in hidden layers?

Most examples of neural networks for classification tasks I've seen use the a softmax layer as output activation function. Normally, the other hidden units use a sigmoid, tanh, or ReLu function as activation function. Using the softmax function here…
21
votes
2 answers

Combining random forest models in scikit learn

I have two RandomForestClassifier models, and I would like to combine them into one meta model. They were both trained using similar, but different, data. How can I do this? rf1 #this is my first fitted RandomForestClassifier object, with 250…
mgoldwasser
  • 14,558
  • 15
  • 79
  • 103
21
votes
2 answers

Scikit classification report - change the format of displayed results

Scikit classification report would show precision and recall scores with two digits only. Is it possible to make it display 4 digits after the dot, I mean instead of 0.67 to show 0.6783? from sklearn.metrics import classification_report print…
Crista23
  • 3,203
  • 9
  • 47
  • 60
21
votes
6 answers

Know any good c++ support vector machine (SVM) libraries?

Do you know of any good c++ svm libraries out there I tried libsvm http://www.csie.ntu.edu.tw/~cjlin/libsvm/ but so far I'm not flabbergasted. I have also heard of SVMLight and TinySVM. Have you tried them ? Any new players ? Thanks !
levesque
  • 8,756
  • 10
  • 36
  • 44
20
votes
2 answers

Difference between Dense(2) and Dense(1) as the final layer of a binary classification CNN?

In a CNN for binary classification of images, should the shape of output be (number of images, 1) or (number of images, 2)? Specifically, here are 2 kinds of last layer in a CNN: keras.layers.Dense(2, activation =…
20
votes
1 answer

Loss & accuracy - Are these reasonable learning curves?

I am learning neural networks and I built a simple one in Keras for the iris dataset classification from the UCI machine learning repository. I used a one hidden layer network with a 8 hidden nodes. Adam optimizer is used with a learning rate of…
Ananda
  • 2,925
  • 5
  • 22
  • 45
20
votes
4 answers

List of all classification algorithms

I have a classification problem and I would like to test all the available algorithms to test their performance in tackling the problem. If you know any classification algorithm other than these listed below, please list it…
20
votes
2 answers

TPR & FPR Curve for different classifiers - kNN, NaiveBayes, Decision Trees in R

I'm trying to understand and plot TPR/FPR for different types of classifiers. I'm using kNN, NaiveBayes and Decision Trees in R. With kNN I'm doing the following: clnum <- as.vector(diabetes.trainingLabels[,1], mode = "numeric") dpknn <- knn(train =…
Kris
  • 5,714
  • 2
  • 27
  • 47
20
votes
8 answers

I want a machine to learn to categorize short texts

I have a ton of short stories about 500 words long and I want to categorize them into one of, let's say, 20 categories: Entertainment Food Music etc I can hand-classify a bunch of them, but I want to implement machine learning to guess the…
atp
  • 30,132
  • 47
  • 125
  • 187
20
votes
5 answers

Where is it best to use svm with linear kernel?

I am currently studing svm and was wondering what the application of svm`s with linear kernel is. In my opinion it must be something applied to solving a linear optimization problem. Is this correct? I appreciate your answer!
Carol.Kar
  • 4,581
  • 36
  • 131
  • 264
20
votes
4 answers

How does music fingerprinting work (for sites such as Shazam and Lala.com)?

My large (120gb) music collection contains many duplicate songs, and I've been trying to fingerprint tracks in the hopes of detecting duplicates. And since I'm a CS Major I'm very curious as to what is done out there? Nothing I do has nearly the…
Niels Joubert
  • 342
  • 3
  • 8