Questions tagged [classification]

In machine learning and statistics, classification is the problem of identifying which of a set of categories a new observation belongs to, on the basis of a training set of data containing observations whose category membership (label) is known.

In machine learning and statistics, classification refers to the problem of predicting category memberships based on a set of pre-labeled examples. It is thus a type of supervised learning.

Some of the most important classification algorithms are support vector machines , logistic regression, naive Bayes, random forest and artificial neural networks .

When we wish to associate inputs with continuous values in a supervised framework, the problem is instead known as . The unsupervised counterpart to classification is known as (or cluster analysis), and involves grouping data into categories based on some measure of inherent similarity.

7859 questions
19
votes
4 answers

Neural networks for email spam detection

Let's say you have access to an email account with the history of received emails from the last years (~10k emails) classified into 2 groups genuine email spam How would you approach the task of creating a neural network solution that could be…
kristof
  • 52,923
  • 24
  • 87
  • 110
19
votes
2 answers

How to set a threshold for a sklearn classifier based on ROC results?

I trained an ExtraTreesClassifier (gini index) using scikit-learn and it suits my needs fairly. Not so good accuracy, but using a 10-fold cross validation, AUC is 0.95. I would like to use this classifier on my work. I am quite new to ML, so please…
Colis
  • 323
  • 1
  • 2
  • 7
19
votes
2 answers

How to implement decision tree with c# (visual studio 2008) - Help

I have a decision tree that i need to turn to a code in C# The simple way of doing it is using if-else statements but in this solution i will need to create 4-5 nested conditions. I am looking for a better way to do it and so far i read a little bit…
Chen
  • 191
  • 1
  • 1
  • 3
19
votes
9 answers

Using Artificial Intelligence (AI) to predict Stock Prices

Given a set of data very similar to the Motley Fool CAPS system, where individual users enter BUY and SELL recommendations on various equities. What I would like to do is show each recommendation and I guess some how rate (1-5) as to whether it was…
akaphenom
  • 6,728
  • 10
  • 59
  • 109
19
votes
2 answers

How to perform logistic regression using vowpal wabbit on very imbalanced dataset

I am trying to use vowpal wabbit for logistic regression. I am not sure if this is the right syntax to do it For training, I do ./vw -d ~/Desktop/new_data.txt --passes 20 --binary --cache_file cache.txt -f lr.vw --loss_function logistic --l1…
user34790
  • 2,020
  • 7
  • 30
  • 37
19
votes
2 answers

Lucene: exception - Query parser encountered after "some word"

I am working on a classification problem to classify product reviews as positive, negative or neutral as per the training data using Lucene API. I am using an ArrayList of Review objects - "reviewList" that stores the attributes for each review…
Reema
  • 1,147
  • 1
  • 9
  • 11
18
votes
2 answers

Retraining after Cross Validation with libsvm

I know that Cross validation is used for selecting good parameters. After finding them, i need to re-train the whole data without the -v option. But the problem i face is that after i train with -v option, i get the cross-validation accuracy( e.g…
lakshmen
  • 28,346
  • 66
  • 178
  • 276
18
votes
3 answers

How to correct unstable loss and accuracy during training? (binary classification)

I am currently working on a small binary classification project using the new keras API in tensorflow. The problem is a simplified version of the Higgs Boson challenge posted on Kaggle.com a few years back. The dataset shape is 2000x14, where the…
Mustfled
  • 181
  • 1
  • 1
  • 4
18
votes
2 answers

Getting a low ROC AUC score but a high accuracy

Using a LogisticRegression class in scikit-learn on a version of the flight delay dataset. I use pandas to select some columns: df = df[["MONTH", "DAY_OF_MONTH", "DAY_OF_WEEK", "ORIGIN", "DEST", "CRS_DEP_TIME", "ARR_DEL15"]] I fill in NaN values…
Jon
  • 2,644
  • 1
  • 22
  • 31
18
votes
1 answer

Adding gaussian noise to a dataset of floating points and save it (python)

I'm working on classification problem where i need to add different levels of gaussian noise to my dataset and do classification experiments until my ML algorithms can't classify the dataset. unfortunately i have no idea how to do that. any advise…
sara
  • 311
  • 1
  • 3
  • 7
18
votes
1 answer

Binary classification with Softmax

I am training a binary classifier using Sigmoid activation function with Binary crossentropy which gives good accuracy around 98%. The same when I train using softmax with categorical_crossentropy gives very low accuracy (< 40%). I am passing the…
AKSHAYAA VAIDYANATHAN
  • 2,715
  • 7
  • 30
  • 51
18
votes
7 answers

Class wise precision and recall for multi class classification in Tensorflow?

Is there a way to get per class precision or recall when doing multiclass classification using tensor flow. For example, If I have y_true and y_pred from each batch, is there a functional way to get precision or recall per class if I have more than…
prateek agrawal
  • 453
  • 1
  • 4
  • 13
18
votes
1 answer

Keras Classification - Object Detection

I am working on a classification then object detection with Keras and Python. I have classified cats/dogs with 80%+ accuracy, Im ok with the current result for now. My question is how do I detect cat or dog from an input image? I'm completely…
Powisss
  • 1,072
  • 3
  • 9
  • 16
18
votes
1 answer

How to read data into TensorFlow batches from example queue?

How do I get TensorFlow example queues into proper batches for training? I've got some images and labels: IMG_6642.JPG 1 IMG_6643.JPG 2 (feel free to suggest another label format; I think I may need another dense to sparse step...) I've read…
JohnAllen
  • 7,317
  • 9
  • 41
  • 65
18
votes
3 answers

Cost function in logistic regression gives NaN as a result

I am implementing logistic regression using batch gradient descent. There are two classes into which the input samples are to be classified. The classes are 1 and 0. While training the data, I am using the following sigmoid function: t = 1 ./ (1 +…