Questions tagged [knn]

In pattern recognition, k-nearest neighbors (k-NN) is a classification algorithm used to classify example based on a set of already classified examples. Algorithm: A case is classified by a majority vote of its neighbors, with the case being assigned to the class most common amongst its K nearest neighbors measured by a distance function. If K = 1, then the case is simply assigned to the class of its nearest neighbor.

The idea of the k-nearest neighbors (k-NN) algorithm is using the features of an example - which are known, to determine the classification of it - which is unknown.

First, some classified samples are supplied to the algorithm. When a new non-classified sample is given, the algorithm finds the k-nearest neighbors to the new sample, and determines what should its classification be, according to the classification of the classified samples, which were given as training set.

The algorithm is sometimes called lazy classification because during "learning" it does nothing - just stores the samples, and all the work is done during classification.

Algorithm

The k-NN algorithm is among the simplest of all machine learning algorithms. A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data.

The training examples are vectors in a multidimensional feature space, each with a class label. The training phase of the algorithm consists only of storing the feature vectors and class labels of the training samples.

In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test point) is classified by assigning the label which is most frequent among the k training samples nearest to that query point.

A commonly used distance metric for continuous variables is Euclidean distance. For discrete variables, such as for text classification, another metric can be used, such as the overlap metric (or Hamming distance).

Often, the classification accuracy of k-NN can be improved significantly if the distance metric is learned with specialized algorithms such as Large Margin Nearest Neighbor or Neighbourhood components analysis.

Useful links

1775 questions
7
votes
3 answers

Accuracy difference on normalization in KNN

I had trained my model on KNN classification algorithm , and I was getting around 97% accuracy. However,I later noticed that I had missed out to normalise my data and I normalised my data and retrained my model, now I am getting an accuracy of only…
Jibin Mathew
  • 4,816
  • 4
  • 40
  • 68
7
votes
0 answers

Sklearn KNeighborsRegressor custom distance metrics

I am using KNeighborsRegressor, but I would like to use it with custom distance function. My training set is pandas DataFrame which looks like: week_day hour minute temp humidity 0 1 9 0 1 1 1 9 0 …
7
votes
1 answer

Why metric='precomputed' doesn't work in sk-learn's k-nearest neighbours?

I'm trying to fit a precomputed kernel matrix when using http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html, it is apparently possible since the metric 'precomputed' exists. I allows you to pass a…
Syzygy
  • 402
  • 1
  • 3
  • 15
7
votes
3 answers

How to use Opencv FeatureDetecter on tiny images

I am using Opencv 3 in Java, I am trying to find small images(like 25x25 pixels) on other image. But FeatureDetector detection (0,0) size Mat on small image. Mat smallImage = ... FeatureDetector detector =…
RustamIS
  • 697
  • 8
  • 24
7
votes
2 answers

Broadcast Annoy object in Spark (for nearest neighbors)?

As Spark's mllib doesn't have nearest-neighbors functionality, I'm trying to use Annoy for approximate Nearest Neighbors. I try to broadcast the Annoy object and pass it to workers; however, it does not operate as expected. Below is code for…
xenocyon
  • 2,409
  • 3
  • 20
  • 22
7
votes
4 answers

Template Matching for Coins with OpenCV

I am undertaking a project that will automatically count values of coins from an input image. So far I have segmented the coins using some pre-processing with edge detection and using the Hough-Transform. My question is how do I proceed from here? I…
Leo
  • 565
  • 5
  • 20
7
votes
2 answers

searching for k nearest points

I have a large set of features that looks like this: id1 28273 20866 29961 27190 31790 19714 8643 14482 5384 .... upto 1000 id2 12343 45634 29961 27130 33790 14714 7633 15483 4484 .... id3 ..... ..... ..... ..... ..... ..... .... ..... .... ....…
Rafaelopasa
  • 89
  • 1
  • 1
  • 2
7
votes
3 answers

Handling Incomplete Data (Data Sparsity) in kNN

I am trying to create a simple recommender system using knn. Lets say I have some a table: User | Book1 | Book2 | Book3 | Book4 | Book5 | Book6 | Book7 | 1 | 5 | ? | 3 | ? | 4 | 3 | 2 | 2 | 3 | 4 | ? | 2…
iCodeLikeImDrunk
  • 17,085
  • 35
  • 108
  • 169
6
votes
3 answers

Obtaining `std::priority_queue` elements in reverse order?

I've written some K-nearest-neighbor query methods which build a list of points that are nearest to a given query point. To maintain that list of neighbors, I use the std::priority_queue such that the top element is the farthest neighbor to the…
Mikael Persson
  • 18,174
  • 6
  • 36
  • 52
6
votes
2 answers

How can I use a different distance measure for the k-nearest neighbor in Java/Weka?

I am using the k-nearest neighbor classifier on weka (http://weka.sourceforge.net/doc.dev/weka/classifiers/lazy/IBk.html). I suppose the Euclidean distance is the default distance function. How could I change that function and use the same class…
Marco
  • 63
  • 2
  • 6
6
votes
4 answers

Why is KNN so much faster with cosine distance than Euclidean distance?

I am fitting a k-nearest neighbors classifier using scikit learn and noticed that the fitting is faster, often by an order of magnitude or more, when using the cosine similarity between two vectors compared to when using the Euclidean similarity.…
Simon Segert
  • 421
  • 1
  • 7
  • 23
6
votes
1 answer

"Error: Must subset rows with a valid subscript vector" in preProcess() when using knnImpute

I'm using kaggle's pokemon data to practice KNN imputation via preProcess(), but when I did I encountered this following message after the predict() step. I am wondering if I use the incorrect data format or if some columns have inappropriate…
Chris T.
  • 1,699
  • 7
  • 23
  • 45
6
votes
3 answers

Providing user defined sample weights for knn classifier in scikit-learn

I am using the scikit-learn KNeighborsClassifier for classification on a dataset with 4 output classes. The following is the code that I am using: knn = neighbors.KNeighborsClassifier(n_neighbors=7, weights='distance', algorithm='auto',…
6
votes
2 answers

How to estimate eps using knn distance plot in DBSCAN

I have the following code to estimate the eps for DBSCAN. If the code is fine then I have obtained the knn distance plot. The code is : ns = 4 nbrs = NearestNeighbors(n_neighbors=ns).fit(data) distances, indices = nbrs.kneighbors(data) distanceDec =…
6
votes
1 answer

What are createBackgroundSubtractorKNN parameters in OpenCV C++?

I need explanations on parameters of createBackgroundSubtractorKNN(int history=500, double dist2Threshold=400.0, bool detectShadows=true) How do history, dist2Threshold, and detectShadows affect the background substractor ?