Questions tagged [classification]

In machine learning and statistics, classification is the problem of identifying which of a set of categories a new observation belongs to, on the basis of a training set of data containing observations whose category membership (label) is known.

In machine learning and statistics, classification refers to the problem of predicting category memberships based on a set of pre-labeled examples. It is thus a type of supervised learning.

Some of the most important classification algorithms are support vector machines , logistic regression, naive Bayes, random forest and artificial neural networks .

When we wish to associate inputs with continuous values in a supervised framework, the problem is instead known as . The unsupervised counterpart to classification is known as (or cluster analysis), and involves grouping data into categories based on some measure of inherent similarity.

7859 questions
16
votes
5 answers

How to use or abuse artifact classifiers in maven?

We are currently attempting to port a very (very) large project built with ant to maven (while also moving to svn). All possibilities are being explored in remodeling the project structure to best fit the maven paradigm. Now to be more specific, I…
Yaneeve
  • 4,751
  • 10
  • 49
  • 87
16
votes
1 answer

Difference between predict vs predict_proba in scikit-learn

Suppose I have created a model, and my target variable is either 0, 1 or 2. It seems that if I use predict, the answer is either of 0, or 1 or 2. But if I use predict_proba, I get a row with 3 cols for each row as follows, for example model =…
16
votes
2 answers

Sigmoid output - can it be interpreted as probability?

Sigmoid function outputs a number between 0 and 1. Is this a probability or is it merely a 'yes or no' depending on whether it's above or below 0.5? Minimal example: Cats vs dogs binary classification. 0 is cat, 1 is dog. Can I perform the…
Voy
  • 5,286
  • 1
  • 49
  • 59
16
votes
1 answer

loss, val_loss, acc and val_acc do not update at all over epochs

I created an LSTM network for sequence classification (binary) where each sample has 25 timesteps and 4 features. The following is my keras network topology: Above, the activation layer after Dense layer uses softmax function. I used…
Kaushik Shrestha
  • 932
  • 1
  • 11
  • 26
16
votes
2 answers

UserWarning: Label not :NUMBER: is present in all training examples

I am doing multilabel classification, where I try to predict correct labels for each document and here is my code: mlb = MultiLabelBinarizer() X = dataframe['body'].values y = mlb.fit_transform(dataframe['tag'].values) classifier = Pipeline([ …
16
votes
1 answer

Multi-output neural network combining regression and classification

If you have both a classification and regression problem that are related and rely on the same input data, is it possible to successfully architect a neural network that gives both classification and regression outputs? If so, how might the loss…
16
votes
1 answer

Why use a restricted Boltzmann machine rather than a multi-layer perceptron?

I'm trying to understand the difference between a restricted Boltzmann machine (RBM), and a feed-forward neural network (NN). I know that an RBM is a generative model, where the idea is to reconstruct the input, whereas an NN is a discriminative…
Karnivaurus
  • 22,823
  • 57
  • 147
  • 247
16
votes
3 answers

How do you draw a line using the weight vector in a Linear Perceptron?

I understand the following: In 2D space, each data point has 2 features: x and y. The weight vector in 2D space contains 3 values [bias, w0, w1] which can be rewritten as [w0,w1,w2]. Each datapoint needs an artificial coordinate [1, x, y] for the…
user1337603
  • 285
  • 2
  • 4
  • 9
16
votes
2 answers

Difference between glmnet() and cv.glmnet() in R?

I'm working on a project that would show the potential influence a group of events have on an outcome. I'm using the glmnet() package, specifically using the Poisson feature. Here's my code: # de <- data imported from sql connection x <-…
Sean Branchaw
  • 597
  • 1
  • 5
  • 21
16
votes
3 answers

Probability and Neural Networks

Is it a good practice to use sigmoid or tanh output layers in Neural networks directly to estimate probabilities? i.e the probability of given input to occur is the output of sigmoid function in the NN EDIT I wanted to use neural network to learn…
Betamoo
  • 14,964
  • 25
  • 75
  • 109
16
votes
3 answers

Beginner's resources/introductions to classification algorithms

everybody. I am entirely new to the topic of classification algorithms, and need a few good pointers about where to start some "serious reading". I am right now in the process of finding out, whether machine learning and automated classification…
16
votes
5 answers

Learning and using augmented Bayes classifiers in python

I'm trying to use a forest (or tree) augmented Bayes classifier (Original introduction, Learning) in python (preferably python 3, but python 2 would also be acceptable), first learning it (both structure and parameter learning) and then using it for…
Anaphory
  • 6,045
  • 4
  • 37
  • 68
15
votes
5 answers

Visualizing Weka classification tree

I am using few data sets available online and trying to visualize tree. However, it does not let me visualize tree option at all. Could anyone please guide me how to get the tree diagram in weka by using data sets available online?
Ramakrishna
15
votes
2 answers

Algorithm to classify a list of products? Take 2

I asked a question similar to this one a couple of weeks ago, but I did not ask the question correctly. So I am re-asking here the question with more details and I would like to get a more AI oriented answer. I have a list representing products…
Martin
  • 39,309
  • 62
  • 192
  • 278
15
votes
1 answer

HBase & Mahout - Using HBase as a Datastore/source for Mahout - Classification

I'm working on a large text classification project and we have our text data (simple messages) stored in HBase. We have two problems, first we would like to use HBase as the source for Mahout classifiers namely Bayers and Random Forests. Second,…
NightWolf
  • 7,694
  • 9
  • 74
  • 121