Questions tagged [knn]

In pattern recognition, k-nearest neighbors (k-NN) is a classification algorithm used to classify example based on a set of already classified examples. Algorithm: A case is classified by a majority vote of its neighbors, with the case being assigned to the class most common amongst its K nearest neighbors measured by a distance function. If K = 1, then the case is simply assigned to the class of its nearest neighbor.

The idea of the k-nearest neighbors (k-NN) algorithm is using the features of an example - which are known, to determine the classification of it - which is unknown.

First, some classified samples are supplied to the algorithm. When a new non-classified sample is given, the algorithm finds the k-nearest neighbors to the new sample, and determines what should its classification be, according to the classification of the classified samples, which were given as training set.

The algorithm is sometimes called lazy classification because during "learning" it does nothing - just stores the samples, and all the work is done during classification.

Algorithm

The k-NN algorithm is among the simplest of all machine learning algorithms. A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data.

The training examples are vectors in a multidimensional feature space, each with a class label. The training phase of the algorithm consists only of storing the feature vectors and class labels of the training samples.

In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test point) is classified by assigning the label which is most frequent among the k training samples nearest to that query point.

A commonly used distance metric for continuous variables is Euclidean distance. For discrete variables, such as for text classification, another metric can be used, such as the overlap metric (or Hamming distance).

Often, the classification accuracy of k-NN can be improved significantly if the distance metric is learned with specialized algorithms such as Large Margin Nearest Neighbor or Neighbourhood components analysis.

Useful links

1775 questions
10
votes
3 answers

How to use both binary and continuous features in the k-Nearest-Neighbor algorithm?

My feature vector has both continuous (or widely ranging) and binary components. If I simply use Euclidean distance, the continuous components will have a much greater impact: Representing symmetric vs. asymmetric as 0 and 1 and some less important…
John Hall
  • 103
  • 4
10
votes
1 answer

Predict with sklearn-KNN using median (instead of mean)

Sklearn-KNN allows one to set weights (e.g., uniform, distance) when calculating the mean x nearest neighbours. Instead of predicting with the mean, is it possible to predict with the median (perhaps with a user-defined function)?
Eugene Yan
  • 841
  • 2
  • 9
  • 23
10
votes
3 answers

How to convert distance into probability?

Сan anyone shine a light to my matlab program? I have data from two sensors and i'm doing a kNN classification for each of them separately. In both cases training set looks like a set of vectors of 42 rows total, like this: [44 12 53 29 35 30 49; …
10
votes
3 answers

how to measure the accuracy of knn classifier in python

I have used knn to classify my dataset. But I do not know how to measure the accuracy of the trained classifier. Does scikit have any inbuilt function to check accuracy of knn classifier? from sklearn.neighbors import KNeighborsClassifier knn =…
user1946217
  • 1,733
  • 6
  • 31
  • 40
10
votes
2 answers

kNN: training, testing, and validation

I am extracting image features from 10 classes with 1000 images each. Since there are 50 features that I can extract, I am thinking of finding the best feature combination to use here. Training, validation and test sets are divided as…
klijo
  • 15,761
  • 8
  • 34
  • 49
9
votes
3 answers

Faster kNN Classification Algorithm in Python

I want to code my own kNN algorithm from scratch, the reason is that I need to weight the features. The problem is that my program is still really slow despite removing for loops and using built in numpy functionality. Can anyone suggest a way to…
Eoin Ó Coinnigh
  • 307
  • 1
  • 3
  • 7
9
votes
1 answer

Tuning leaf_size to decrease time consumption in Scikit-Learn KNN

I was trying to implement KNN for handwritten character recognition where I found out that the execution of code was taking a lot of time. When added parameter leaf_size with value 400, I observed that time taken by code to execute was significantly…
9
votes
2 answers

Grid Search parameter and cross-validated data set in KNN classifier in Scikit-learn

I'm trying to perform my first KNN Classifier using SciKit-Learn. I've been following the User Guide and other online examples but there are a few things I am unsure about. For this post lets use the following X = data Y = target In most…
browser
  • 313
  • 1
  • 3
  • 12
9
votes
4 answers

Is there any function to calculate Precision and Recall using Matlab?

I have problem about calculating the precision and recall for classifier in matlab. I use fisherIris data (that consists of 150 datapoints, 50-setosa, 50-versicolor, 50-virginica). I have classified using kNN algorithm. Here is my confusion…
user19565
  • 155
  • 1
  • 2
  • 9
9
votes
2 answers

Is k-d tree efficient for kNN search. k nearest neighbors search

I have to implement k nearest neighbors search for 10 dimensional data in kd-tree. But problem is that my algorithm is very fast for k=1, but as much as 2000x slower for k>1 (k=2,5,10,20,100) Is this normal for kd trees, or am I doing something…
Andraz
  • 709
  • 2
  • 7
  • 16
9
votes
3 answers

K-nearest neighbour C/C++ implementation

Where can I find an serial C/C++ implementation of the k-nearest neighbour algorithm? Do you know of any library that has this? I have found openCV but the implementation is already parallel. I want to start from a serial implementation and…
alexsardan
  • 253
  • 1
  • 4
  • 7
8
votes
1 answer

Problems with k-NN regression in R

I am trying to run knnreg from the package caret. For some reason, this training set works: > summary(train1) V1 V2 V3 13 : 10474 1 : 6435 7 : 8929 10 : 10315 2 : …
thecheech
  • 2,041
  • 3
  • 18
  • 25
8
votes
1 answer

Why is KNN much faster than decision tree?

Once in an interview, I encountered a question from the employer. He asked me why KNN classifier is much faster than decision tree for example in letter recognition or in face recognition? I had completely no idea at that time. So I want to know…
zfz
  • 1,597
  • 1
  • 22
  • 45
7
votes
7 answers

K Nearest Neighbour Algorithm doubt

I am new to Artificial Intelligence. I understand K nearest neighbour algorithm and how to implement it. However, how do you calculate the distance or weight of things that aren't on a scale? For example, distance of age can be easily calculated,…
wai
  • 8,923
  • 4
  • 24
  • 19
7
votes
1 answer

Euclidean distance, different results between Scipy, pure Python, and Java

I was playing around with different implementations of the Euclidean distance metric and I noticed that I get different results for Scipy, pure Python, and Java. Here's how I compute the distance using Scipy (= option 1): distance =…
Silas Berger
  • 129
  • 6