Questions tagged [knn]

In pattern recognition, k-nearest neighbors (k-NN) is a classification algorithm used to classify example based on a set of already classified examples. Algorithm: A case is classified by a majority vote of its neighbors, with the case being assigned to the class most common amongst its K nearest neighbors measured by a distance function. If K = 1, then the case is simply assigned to the class of its nearest neighbor.

The idea of the k-nearest neighbors (k-NN) algorithm is using the features of an example - which are known, to determine the classification of it - which is unknown.

First, some classified samples are supplied to the algorithm. When a new non-classified sample is given, the algorithm finds the k-nearest neighbors to the new sample, and determines what should its classification be, according to the classification of the classified samples, which were given as training set.

The algorithm is sometimes called lazy classification because during "learning" it does nothing - just stores the samples, and all the work is done during classification.

Algorithm

The k-NN algorithm is among the simplest of all machine learning algorithms. A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data.

The training examples are vectors in a multidimensional feature space, each with a class label. The training phase of the algorithm consists only of storing the feature vectors and class labels of the training samples.

In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test point) is classified by assigning the label which is most frequent among the k training samples nearest to that query point.

A commonly used distance metric for continuous variables is Euclidean distance. For discrete variables, such as for text classification, another metric can be used, such as the overlap metric (or Hamming distance).

Often, the classification accuracy of k-NN can be improved significantly if the distance metric is learned with specialized algorithms such as Large Margin Nearest Neighbor or Neighbourhood components analysis.

Useful links

1775 questions
0
votes
0 answers

K value vs Accuracy in KNN

I am trying to learn KNN by working on Breast cancer dataset provided by UCI repository. The Total size of dataset is 699 with 9 continuous variables and 1 class variable. I tested my accuracy on cross-validation set. For K =21 & K =19. Accuracy is…
Rahul Saxena
  • 422
  • 1
  • 9
  • 22
0
votes
0 answers

How to split an array for KNN?

I'm using the brute-force algorithm for KNN to find nearest neighbors in my web service. One downside of this approach is that I need enough memory on every machine to load the whole array for KNN. Now I'm thinking of splitting the array, do KNN…
satoru
  • 31,822
  • 31
  • 91
  • 141
0
votes
1 answer

How to copy training data n times during classification?

I have a classification problem with two data-set with 200 and 50 points respectively. Out of these 40 data points are taken as test set. I have chosen kNN as the classifier considering five nearest neighbors. n_neighbors = 5 std = 5 # generate…
0
votes
0 answers

How to enhance the accuracy of knn classifier?

my homework is to make a code in Matlab to calculate the accuracy of the knn classifier if my data as the following Training data Data length: 6 seconds, 3 channels, 768 samples / trial, 140 tests, fs = 128 Hz Test data: 3 channels, 1152 samples /…
lavender
  • 369
  • 1
  • 3
  • 9
0
votes
1 answer

OPEN CV c++ person Recognition K-nn

I`m trying to write a simple program for person recognition using k-NN algorithm.I think this problem is a classic one, but I need some help. The k-NN classifier needs to compute some distances ,so my question is ,how to compare,or how to compute…
Elneny
  • 41
  • 1
  • 5
0
votes
0 answers

Not able to use KNN from sci-kit learn on dataset with vector data in array(double) format

I get the following error when attempting to run KNN on my dataset. Error: setting an array element with a sequence This is how my data appears: This is the dtypes of my pandas df: f_vector object label int64 mi_vector …
Vikaasa Ramdas
  • 411
  • 1
  • 5
  • 11
0
votes
0 answers

KNN cross validation takes too long in R

I'm new to Machine learning and I was successful in building a KNN classifier. Now I wanted to implement n cross validation but it was taking too long to do so in R. Is there a more efficient way of doing ? Below is my code (been running for 30…
misctp asdas
  • 973
  • 4
  • 13
  • 35
0
votes
1 answer

How to do N Cross validation in KNN python sklearn?

I'm new to machine learning and im trying to do the KNN algorithm on KDD Cup 1999 dataset. I managed to create the classifier and predict the dataset with a result of roughly 92% accuracy. But I observed that my accuracy may not be accurate as the…
misctp asdas
  • 973
  • 4
  • 13
  • 35
0
votes
1 answer

MinMax Scaler in sklearn does not normalize values of column between 0 and 1

I'm working on KNN algorithm in python and tried to normalise my data frames with the MinMaxScaler to transform the data in a range between 0 to 1. However when I return the output, I observe some column min / max the output exceeds 1. Am i using…
misctp asdas
  • 973
  • 4
  • 13
  • 35
0
votes
1 answer

knn algorithm error percentage

I have developed knn algorithm for my data set. My data set contains 5000 *17 values. In this data set , I divide my data as 4000 for validation and 1000 for training. My question is at the end my error percentage is 0.0158 for training data.Does…
Muaa2404
  • 1
  • 2
0
votes
1 answer

Varying n_neighbors in scikit-learn KNN regression

I am using scikit-learn's KNN regressor to fit a model to a large dataset with n_neighbors = 100-500. Given the nature of the data, some parts (think: sharp delta-function like peaks) are better fit with fewer neighbors (n_neighbors ~ 20-50) so that…
saud
  • 773
  • 1
  • 7
  • 20
0
votes
1 answer

"The format of predictions is incorrect"

Implementation of ROCR curve, kNN ,K 10 fold cross validation. I am using Ionosphere dataset. Here is the attribute information for your reference: -- All 34 are continuous, as described above -- The 35th attribute is either "good" or "bad"…
codingyo
  • 329
  • 1
  • 5
  • 17
0
votes
1 answer

How to predict KNN classifier without using built-in function

I have some trouble on predicting KNN classifier without using built-in function. I got stuck here and had no idea how to go to next step. Here is my code: % calculate Euclidean distance dist = pdist2(test, train, 'euclidean'); for k = [1 3 5 7] …
BigD
  • 571
  • 7
  • 24
0
votes
1 answer

How to select needed values out of a .txt file?

txt file containing the characteristics(estrato, size, age, bedrooms, bathrooms, floor, and label) of 205 apartments in Medellin, Colombia and i need to load the file and save each of the characteristics in their own specific variable so i can pass…
0
votes
1 answer

A string distance function for kNN implementation for a cookbook-app recommendation engine

Basically, I want to use a kNN algorithm to build something like a search recommendation engine for a cookbook app. The idea is that if the user inputs something like "chicken","chicken breast", "chicken meat", "boneless chicken" and etc. their…
Ilya Lapan
  • 1,103
  • 2
  • 12
  • 31
1 2 3
99
100