Questions tagged [knn]

In pattern recognition, k-nearest neighbors (k-NN) is a classification algorithm used to classify example based on a set of already classified examples. Algorithm: A case is classified by a majority vote of its neighbors, with the case being assigned to the class most common amongst its K nearest neighbors measured by a distance function. If K = 1, then the case is simply assigned to the class of its nearest neighbor.

The idea of the k-nearest neighbors (k-NN) algorithm is using the features of an example - which are known, to determine the classification of it - which is unknown.

First, some classified samples are supplied to the algorithm. When a new non-classified sample is given, the algorithm finds the k-nearest neighbors to the new sample, and determines what should its classification be, according to the classification of the classified samples, which were given as training set.

The algorithm is sometimes called lazy classification because during "learning" it does nothing - just stores the samples, and all the work is done during classification.

Algorithm

The k-NN algorithm is among the simplest of all machine learning algorithms. A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data.

The training examples are vectors in a multidimensional feature space, each with a class label. The training phase of the algorithm consists only of storing the feature vectors and class labels of the training samples.

In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test point) is classified by assigning the label which is most frequent among the k training samples nearest to that query point.

A commonly used distance metric for continuous variables is Euclidean distance. For discrete variables, such as for text classification, another metric can be used, such as the overlap metric (or Hamming distance).

Often, the classification accuracy of k-NN can be improved significantly if the distance metric is learned with specialized algorithms such as Large Margin Nearest Neighbor or Neighbourhood components analysis.

Useful links

1775 questions
0
votes
3 answers

KNN Accuracy of 100%?

I have used the following code for KNN jd <- jobdata head (jd) jd$ipermanency rate= as.integer(as.factor(jd$ipermanency rate)) jd$`permanency rate`=as.integer(as.factor(jd$`permanency rate`)) jd$`job skills`=as.integer(as.factor(jd$`job…
A Kelly
  • 13
  • 1
  • 4
0
votes
1 answer

postgis difference in terms of precision and performance between geography and geometry

I developed a telegram bot that match 2 users with the same language and when they both are looking for a partner with a postgres query. I would like to add the optional ability to match users also depending on location (closest user). Since it is…
91DarioDev
  • 1,612
  • 1
  • 14
  • 31
0
votes
1 answer

String Data Training for KNN Classification : Python

i have been trying to learn to train my data i.e implement machine learning which has string data. all i could understand was, you can convert the string data type to categorical, but i am unable to do it using LabelEncoder. and i heard that we…
0
votes
0 answers

How to get Closest K matches( K nearest Neighbors) among many choices of points?

I could not come up with an algorithm for the scenario I have. Problem I have: I have a movie name which is associated with features like emotions,imdb rating,user rating etc.This is how it looks MovieName | Rating | Polarity | Action |…
0
votes
0 answers

Incorrect character detection in OCR project

I have done a DNI text recognition project using Visual Studio, the OpenCV library and the KNN recognition method. To simplify the project I cut the fields whose information I want to extract and I did the whole process to each of them, obtaining…
0
votes
1 answer

Python Scikit learn Knn nearest neighbor regression

I am using the Nearest Neighbor regression from Scikit-learn in Python with 20 nearest neighbors as the parameter. I trained the model and then saved it using this code: knn = neighbors.KNeighborsRegressor(n_neighbors,…
0
votes
1 answer

Encode each training image as a histogram of the number of times each vocabulary element shows up for Bag of Visual Words

I want to implement bag of visual words in MATLAB. I used SURF features to extract features from the images and k-means to cluster those features into k clusters. I now have k centroids and I want to know how many times each cluster is used by…
Daniel.V
  • 2,322
  • 7
  • 28
  • 58
0
votes
1 answer

voting procedure for kNN calssification in python(array.argsort)

Im pretty much a beginner, but Im stumped and I feel like I shouldnt be. I am learning about the kNN algorithm using numpy. The code is the following: def kNN_classify(query, dataset, labels, k): dataSetSize = dataset.shape[0] diffMat =…
BioHazZzZard
  • 121
  • 3
0
votes
1 answer

Error in R code for creating a model using KNN algorithm for text mining

I am trying to do some text mining over 35000 rows of data, and when i try to create the model from modeldata, I take the rows that I decided to use for training and test. And, I also feed the known categories of the training data into the model. I…
Renato Lyke
  • 43
  • 10
0
votes
0 answers

k-NN regression using matlab?

I have two table namely Training_table and Testing table each contains two parameters of size say 100. I want to use k-NN for training using training_table and test the algorithm using 'Testing_table' values. let us consider, In Training table x =…
Ankita
  • 485
  • 5
  • 18
0
votes
0 answers

Resampling results across tuning parameters

Since I'm new to machine learning and I'm trying to build the model using caret package available in R. I have a small confusion when using repeated cv method. Resampling output shows desired parameters which has highest accuracy. I did not clearly…
0
votes
2 answers

Pyplot truth value of an array with more than one element is ambiguous

I am trying to implement a knn 1D estimate: # nearest neighbors estimate def nearest_n(x, k, data): # Order dataset #data = np.sort(data, kind='mergesort') nnb = [] # iterate over all data and get k nearest neighbours around x …
nik.yan
  • 71
  • 1
  • 11
0
votes
1 answer

Flink: ERROR parse numeric value format

I'm trying to develop a K-means model in Flink (Scala), using Zeppelin. This is part of my simple code: //Reading data val mapped : DataSet[Vector] = data.map {x => DenseVector (x._1,x._2) } //Create algorithm val knn = KNN() .setK(3) …
Borja
  • 194
  • 1
  • 3
  • 17
0
votes
0 answers

Sklearn KNN with null values

I have a dataset of growth rates by time and individual. I'm trying to use KNN to predict growth rates based on historical growth for other individuals. To start, I transformed my transaction-level dataset so that each row represents an individual,…
flyingmeatball
  • 7,457
  • 7
  • 44
  • 62
0
votes
1 answer

kknn - Iris example - what is the value of k?

I'm trying to learn how to use kknn and am working through the iris example in the documentation. Meanwhile, my homework requires me to apply kknn to a dataset and iteratively supply k values and test to find the best one. While supplying a value…
James
  • 673
  • 6
  • 19