Questions tagged [knn]

In pattern recognition, k-nearest neighbors (k-NN) is a classification algorithm used to classify example based on a set of already classified examples. Algorithm: A case is classified by a majority vote of its neighbors, with the case being assigned to the class most common amongst its K nearest neighbors measured by a distance function. If K = 1, then the case is simply assigned to the class of its nearest neighbor.

The idea of the k-nearest neighbors (k-NN) algorithm is using the features of an example - which are known, to determine the classification of it - which is unknown.

First, some classified samples are supplied to the algorithm. When a new non-classified sample is given, the algorithm finds the k-nearest neighbors to the new sample, and determines what should its classification be, according to the classification of the classified samples, which were given as training set.

The algorithm is sometimes called lazy classification because during "learning" it does nothing - just stores the samples, and all the work is done during classification.

Algorithm

The k-NN algorithm is among the simplest of all machine learning algorithms. A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data.

The training examples are vectors in a multidimensional feature space, each with a class label. The training phase of the algorithm consists only of storing the feature vectors and class labels of the training samples.

In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test point) is classified by assigning the label which is most frequent among the k training samples nearest to that query point.

A commonly used distance metric for continuous variables is Euclidean distance. For discrete variables, such as for text classification, another metric can be used, such as the overlap metric (or Hamming distance).

Often, the classification accuracy of k-NN can be improved significantly if the distance metric is learned with specialized algorithms such as Large Margin Nearest Neighbor or Neighbourhood components analysis.

Useful links

1775 questions
0
votes
1 answer

Converting string into a type supported by KNeighborsClassifier in Python

I have converted an object of KNeighborsClassifier into a string to send it from client to server. I found it as an incompatable datatype when I use the received data at the server side. Program at the client…
Neenu
  • 479
  • 1
  • 6
  • 15
0
votes
1 answer

How to use metric='correlation' with KNeighborsClassifier

I am trying to use KNeighborsClassifier(n_neighbors=15, algorithm='ball_tree', metric='correlation') However, I get the error ValueError: Metric 'correlation' not valid for algorithm 'ball_tree' Why is it not possible to use ball_tree? Am I limited…
Mike El Jackson
  • 771
  • 3
  • 14
  • 23
0
votes
1 answer

k nearest neighbours in multithread program

Given a training set and a test point T, which needs to be classified. If I divide the training set into n parts, then run knn algorithm (k=1) on each part. After that I compare results from each part. Would it give me the same results as if I run…
gunner308
  • 1
  • 2
0
votes
1 answer

KNN using R - in Production

I have some dummy data that consists of 99 rows of data, one column is free text data and one column is the cateogry. It has been categorised into either Customer Service or Not Customer Service related. I passed the 99 rows of data into my R…
Richard
  • 3
  • 1
0
votes
0 answers

KNeighbors: ValueError 'continuous-multioutput'

/home/dogus/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.py:429: DataConversionWarning: Data with input dtype int8 was converted to float64 by the normalize function. warnings.warn(msg, _DataConversionWarning) Traceback…
Doğuş
  • 1,887
  • 1
  • 16
  • 24
0
votes
1 answer

How to replace obsolete knnclassify() function of matlab with the new fitcknn() function?

I have a sample matrix, training matrix and a group matrix. I have used the obsolete knnclassify() function. I would like to replace it with the fitcknn() function. I'm new to matlab. How does the fitcknn() method work and what are the changes that…
0
votes
1 answer

SVM train and predict using OpenCvLib320 in Android

I am using Android studio with OpenCV sdk 3.2 version. I am trying to train using readymade API available for SVM Part of my code retrieves the image from the internal memory of the phone, and helps to train using SVM File mnist_images_file =…
0
votes
2 answers

How to classify single text using classifier algorithms

I have set of documents which is clustered. Now each document has a label. I wanted to build a classifier based on this, train and test it so it works fine and falls into a proper cluster if I give a new document/text. So I used countVectorizer to…
Nitesh kumar
  • 348
  • 2
  • 8
  • 25
0
votes
1 answer

Would Sklearn GridSearchCV go through all the possible default options of the estimator's parameters?

Algorithms in scikit-learn might have some parameters that have default range of options, sklearn.neighbors.KNeighborsClassifier(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None,…
Adam Liu
  • 1,288
  • 13
  • 17
0
votes
0 answers

Decision boundary for KNN classifier

I'm trying to plot the decision boundary for KNN classifier using MATLAB. I have searched online and I found that voronoi diagrams are used to plot the decision boundary. But I have one question: Suppose if I have 2 dimensional data like this: class…
0
votes
2 answers

scikit-learn: How to classify data train and data test with a different features?

My data train: 3 features (permanent data) My data test: it changes everytime (2 features or 1 feature), in my example code it's 2 features now. I want to classify with a different feature, because it's a different dimension. How can I achieve this?…
FBillyan
  • 15
  • 5
0
votes
0 answers

Error in knn(trainDataX, validationDataX, trainingType, k = 3) : 'train' and 'class' have different lengths

i am learning how to use R program language. Now i am learning the pca technique, the question was to train a model with the knn technique and to find out which is the accuracy value for the training model here is my code, actually i am not preety…
0
votes
0 answers

Efficient kNN graph construction with deferred selection of k

Using Levenshtein distance as a metric, I want to find the exact k-nearest neighbours for all elements in a large set of strings, but I am not yet sure how high a value of k I will require. Is there an algorithm or data structure that will allow me…
Aramdooli
  • 43
  • 1
  • 5
0
votes
0 answers

Error in using knn for multidimensional data

I am a beginer in Machine Learning, I am trying to classify multi dimensional data into two classes. Each data point is 40x6 float values. To begin with I have read my csv file. In this file shot number represents data point.…
0
votes
1 answer

Masking 2d arrays using boolean array

I am trying to implement kfold validation in python for the CIFAR-10 training set so for that i am trying to mask the the training set using a boolean list X_Train_folds is an array of shape 5X1000X3072 selection = [True, True, True, True,…
Ravin Kohli
  • 143
  • 10
1 2 3
99
100