Questions tagged [knn]

In pattern recognition, k-nearest neighbors (k-NN) is a classification algorithm used to classify example based on a set of already classified examples. Algorithm: A case is classified by a majority vote of its neighbors, with the case being assigned to the class most common amongst its K nearest neighbors measured by a distance function. If K = 1, then the case is simply assigned to the class of its nearest neighbor.

The idea of the k-nearest neighbors (k-NN) algorithm is using the features of an example - which are known, to determine the classification of it - which is unknown.

First, some classified samples are supplied to the algorithm. When a new non-classified sample is given, the algorithm finds the k-nearest neighbors to the new sample, and determines what should its classification be, according to the classification of the classified samples, which were given as training set.

The algorithm is sometimes called lazy classification because during "learning" it does nothing - just stores the samples, and all the work is done during classification.

Algorithm

The k-NN algorithm is among the simplest of all machine learning algorithms. A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data.

The training examples are vectors in a multidimensional feature space, each with a class label. The training phase of the algorithm consists only of storing the feature vectors and class labels of the training samples.

In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test point) is classified by assigning the label which is most frequent among the k training samples nearest to that query point.

A commonly used distance metric for continuous variables is Euclidean distance. For discrete variables, such as for text classification, another metric can be used, such as the overlap metric (or Hamming distance).

Often, the classification accuracy of k-NN can be improved significantly if the distance metric is learned with specialized algorithms such as Large Margin Nearest Neighbor or Neighbourhood components analysis.

Useful links

1775 questions

votes

1 answer

Converting string into a type supported by KNeighborsClassifier in Python

I have converted an object of KNeighborsClassifier into a string to send it from client to server. I found it as an incompatable datatype when I use the received data at the server side. Program at the client…

python sockets knn

asked Mar 25 '17 at 04:27

Neenu

votes

1 answer

How to use metric='correlation' with KNeighborsClassifier

I am trying to use KNeighborsClassifier(n_neighbors=15, algorithm='ball_tree', metric='correlation') However, I get the error ValueError: Metric 'correlation' not valid for algorithm 'ball_tree' Why is it not possible to use ball_tree? Am I limited…

python python-3.x machine-learning scikit-learn knn

asked Mar 25 '17 at 03:13

Mike El Jackson

votes

1 answer

k nearest neighbours in multithread program

Given a training set and a test point T, which needs to be classified. If I divide the training set into n parts, then run knn algorithm (k=1) on each part. After that I compare results from each part. Would it give me the same results as if I run…

c++ multithreading knn

asked Mar 15 '17 at 23:49

gunner308

votes

1 answer

KNN using R - in Production

I have some dummy data that consists of 99 rows of data, one column is free text data and one column is the cateogry. It has been categorised into either Customer Service or Not Customer Service related. I passed the 99 rows of data into my R…

r knn

asked Mar 06 '17 at 17:11

Richard

votes

0 answers

KNeighbors: ValueError 'continuous-multioutput'

/home/dogus/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.py:429: DataConversionWarning: Data with input dtype int8 was converted to float64 by the normalize function. warnings.warn(msg, _DataConversionWarning) Traceback…

python pca knn

asked Mar 06 '17 at 11:58

Doğuş

1,887
1
16
24

votes

1 answer

How to replace obsolete knnclassify() function of matlab with the new fitcknn() function?

I have a sample matrix, training matrix and a group matrix. I have used the obsolete knnclassify() function. I would like to replace it with the fitcknn() function. I'm new to matlab. How does the fitcknn() method work and what are the changes that…

matlab classification knn

asked Mar 05 '17 at 08:52

amith_ajith

votes

1 answer

SVM train and predict using OpenCvLib320 in Android

I am using Android studio with OpenCV sdk 3.2 version. I am trying to train using readymade API available for SVM Part of my code retrieves the image from the internal memory of the phone, and helps to train using SVM File mnist_images_file =…

java opencv android-studio svm knn

asked Mar 01 '17 at 11:59

Ankit Verma

votes

2 answers

How to classify single text using classifier algorithms

I have set of documents which is clustered. Now each document has a label. I wanted to build a classifier based on this, train and test it so it works fine and falls into a proper cluster if I give a new document/text. So I used countVectorizer to…

python machine-learning classification knn naivebayes

asked Feb 21 '17 at 08:10

Nitesh kumar

votes

1 answer

Would Sklearn GridSearchCV go through all the possible default options of the estimator's parameters?

Algorithms in scikit-learn might have some parameters that have default range of options, sklearn.neighbors.KNeighborsClassifier(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None,…

machine-learning scikit-learn data-analysis knn grid-search

asked Feb 17 '17 at 06:43

Adam Liu

1,288
13
17

votes

0 answers

Decision boundary for KNN classifier

I'm trying to plot the decision boundary for KNN classifier using MATLAB. I have searched online and I found that voronoi diagrams are used to plot the decision boundary. But I have one question: Suppose if I have 2 dimensional data like this: class…

matlab knn

asked Jan 31 '17 at 07:13

bhaskar jupudi

votes

2 answers

scikit-learn: How to classify data train and data test with a different features?

My data train: 3 features (permanent data) My data test: it changes everytime (2 features or 1 feature), in my example code it's 2 features now. I want to classify with a different feature, because it's a different dimension. How can I achieve this?…

python numpy scikit-learn knn

asked Jan 22 '17 at 14:16

FBillyan

votes

0 answers

Error in knn(trainDataX, validationDataX, trainingType, k = 3) : 'train' and 'class' have different lengths

i am learning how to use R program language. Now i am learning the pca technique, the question was to train a model with the knn technique and to find out which is the accuracy value for the training model here is my code, actually i am not preety…

r pca knn

asked Jan 21 '17 at 21:03

micpap4409

votes

0 answers

Efficient kNN graph construction with deferred selection of k

Using Levenshtein distance as a metric, I want to find the exact k-nearest neighbours for all elements in a large set of strings, but I am not yet sure how high a value of k I will require. Is there an algorithm or data structure that will allow me…

algorithm levenshtein-distance knn

asked Jan 06 '17 at 18:04

Aramdooli

votes

0 answers

Error in using knn for multidimensional data

I am a beginer in Machine Learning, I am trying to classify multi dimensional data into two classes. Each data point is 40x6 float values. To begin with I have read my csv file. In this file shot number represents data point.…

python machine-learning scikit-learn classification knn

asked Jan 06 '17 at 17:26

sanjiv soni

votes

1 answer

Masking 2d arrays using boolean array

I am trying to implement kfold validation in python for the CIFAR-10 training set so for that i am trying to mask the the training set using a boolean list X_Train_folds is an array of shape 5X1000X3072 selection = [True, True, True, True,…

python arrays knn

asked Jan 05 '17 at 05:58

Ravin Kohli

Prev 1 2 3

…

100 Next