Questions tagged [classification]

In machine learning and statistics, classification is the problem of identifying which of a set of categories a new observation belongs to, on the basis of a training set of data containing observations whose category membership (label) is known.

In machine learning and statistics, classification refers to the problem of predicting category memberships based on a set of pre-labeled examples. It is thus a type of supervised learning.

Some of the most important classification algorithms are support vector machines , logistic regression, naive Bayes, random forest and artificial neural networks .

When we wish to associate inputs with continuous values in a supervised framework, the problem is instead known as . The unsupervised counterpart to classification is known as (or cluster analysis), and involves grouping data into categories based on some measure of inherent similarity.

7859 questions
18
votes
3 answers

GBM R function: get variable importance separately for each class

I am using the gbm function in R (gbm package) to fit stochastic gradient boosting models for multiclass classification. I am simply trying to obtain the importance of each predictor separately for each class, like in this picture from the Hastie…
Antoine
  • 1,649
  • 4
  • 23
  • 50
18
votes
1 answer

Finding K-nearest neighbors and its implementation

I am working on classifying simple data using KNN with Euclidean distance. I have seen an example on what I would like to do that is done with the MATLAB knnsearch function as shown below: load fisheriris x =…
Young_DataAnalyst
  • 263
  • 2
  • 4
  • 11
18
votes
5 answers

How to approach machine learning problems with high dimensional input space?

How should I approach a situtation when I try to apply some ML algorithm (classification, to be more specific, SVM in particular) over some high dimensional input, and the results I get are not quite satisfactory? 1, 2 or 3 dimensional data can be…
sold
  • 393
  • 1
  • 3
  • 6
17
votes
9 answers

Determine whether the two classes are linearly separable (algorithmically in 2D)

There are two classes, let's call them X and O. A number of elements belonging to these classes are spread out in the xy-plane. Here is an example where the two classes are not linearly separable. It is not possible to draw a straight line that…
Håvard Geithus
  • 5,544
  • 7
  • 36
  • 51
17
votes
3 answers

What's the best open-source Java Bayesian spam filter library?

In other answers at Stackoverflow it's been suggested that Weka is good, but there are others (Classifier4j, jBNC, Naiban). Does anyone have actual experience with these?
Jason Cohen
  • 81,399
  • 26
  • 107
  • 114
17
votes
1 answer

Neural Network Ordinal Classification for Age

I have created a simple neural network (Python, Theano) to estimate a persons age based on their spending history from a selection of different stores. Unfortunately, it is not particularly accurate. The accuracy might be hurt by the fact that the…
17
votes
1 answer

How to implement pixel-wise classification for scene labeling in TensorFlow?

I am working on a deep learning model using Google's TensorFlow. The model should be used to segment and label scenes. I am using the SiftFlow dataset which has 33 semantic classes and images with 256x256 pixels. As a result, at my final layer…
Gooshan
  • 2,361
  • 1
  • 20
  • 15
17
votes
3 answers

Monitor training/validation process in Caffe

I'm training Caffe Reference Model for classifying images. My work requires me to monitor the training process by drawing graph of accuracy of the model after every 1000 iterations on entire training set and validation set which has 100K and 50K…
DucCuong
  • 638
  • 1
  • 7
  • 26
17
votes
5 answers

What is the difference between classification and prediction?

What is the difference between classification and prediction in machine learning?
James
  • 181
  • 1
  • 1
  • 4
17
votes
2 answers

Sentiment analysis with NLTK python for sentences using sample data or webservice?

I am embarking upon a NLP project for sentiment analysis. I have successfully installed NLTK for python (seems like a great piece of software for this). However,I am having trouble understanding how it can be used to accomplish my task. Here is my…
Ke.
  • 2,484
  • 8
  • 40
  • 78
17
votes
2 answers

Dealing with the class imbalance in binary classification

Here's a brief description of my problem: I am working on a supervised learning task to train a binary classifier. I have a dataset with a large class imbalance distribution: 8 negative instances every one positive. I use the f-measure, i.e. the…
blueSurfer
  • 5,651
  • 13
  • 42
  • 63
17
votes
3 answers

Naive Bayes: Imbalanced Test Dataset

I am using scikit-learn Multinomial Naive Bayes classifier for binary text classification (classifier tells me whether the document belongs to the category X or not). I use a balanced dataset to train my model and a balanced test set to test it and…
17
votes
2 answers

Learning Weka on the Command Line

I am fairly new to Weka and even more new to Weka on the command line. I find documentation is poor and I am struggling to figure out a few things to do. For example, want to take two .arff files, one for training, one for testing and get an…
Reily Bourne
  • 5,117
  • 9
  • 30
  • 41
17
votes
2 answers

How to read the classifier confusion matrix in WEKA

Sorry, I am new to WEKA and just learning. In my decision tree (J48) classifier output, there is a confusion Matrix: a b <----- classified as 130 8 a = functional 15 150 b = non-functional How do I read this matrix? What's the…
JakeSays
  • 2,048
  • 8
  • 29
  • 43
16
votes
1 answer

Extract tf-idf vectors with lucene

I have indexed a set of documents using lucene. I also have stored DocumentTermVector for each document content. I wrote a program and got the term frequency vector for each document, but how can I get tf-idf vector of each document? Here is my code…
orezvani
  • 3,595
  • 8
  • 43
  • 57