Questions tagged [knn]

In pattern recognition, k-nearest neighbors (k-NN) is a classification algorithm used to classify example based on a set of already classified examples. Algorithm: A case is classified by a majority vote of its neighbors, with the case being assigned to the class most common amongst its K nearest neighbors measured by a distance function. If K = 1, then the case is simply assigned to the class of its nearest neighbor.

The idea of the k-nearest neighbors (k-NN) algorithm is using the features of an example - which are known, to determine the classification of it - which is unknown.

First, some classified samples are supplied to the algorithm. When a new non-classified sample is given, the algorithm finds the k-nearest neighbors to the new sample, and determines what should its classification be, according to the classification of the classified samples, which were given as training set.

The algorithm is sometimes called lazy classification because during "learning" it does nothing - just stores the samples, and all the work is done during classification.

Algorithm

The k-NN algorithm is among the simplest of all machine learning algorithms. A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data.

The training examples are vectors in a multidimensional feature space, each with a class label. The training phase of the algorithm consists only of storing the feature vectors and class labels of the training samples.

In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test point) is classified by assigning the label which is most frequent among the k training samples nearest to that query point.

A commonly used distance metric for continuous variables is Euclidean distance. For discrete variables, such as for text classification, another metric can be used, such as the overlap metric (or Hamming distance).

Often, the classification accuracy of k-NN can be improved significantly if the distance metric is learned with specialized algorithms such as Large Margin Nearest Neighbor or Neighbourhood components analysis.

Useful links

1775 questions
0
votes
1 answer

Changing color in Scikit's example for plotting decision boundaries of a VotingClassifier?

Hi I am trying to reproduce Scikit's example for plotting decision boundaries Voting Classifiers. The classification part is rather straight forward, and the neat way of plotting several plots in a single figure is intruiging. However, I have…
Rachel
  • 1,937
  • 7
  • 31
  • 58
0
votes
0 answers

Applying user defined weights in KNeighbourClassifier in sklearn

can any one please guide me in writing a user-defined function to provide weights for KNeighborClassifier in python(sklearn)? If possible please explain with an example
NehaK
  • 1
  • 1
0
votes
1 answer

KNN edges / graph

I have created the dissimilarity matrix for my data. I am trying to get the set of edges results from the KNN method. I have read that KNN is a supervised learning method. So from all the literature that I have been reading there is this training…
0
votes
1 answer

Output from scikit learn ML algorithms

I want to know is there any way to see "under the hood" on other python sklearn algorithms. For example, I have created a decision tree classifier using sklearn and have been able to export the exact structure of the tree but would like to also be…
Sjoseph
  • 853
  • 2
  • 14
  • 23
0
votes
1 answer

computing the euclidean distance for KNN

I've been seeing a lot of examples of computing euclidean distance for KNN but non for sentiment classification. For example I have a sentence "a very close game" How do I compute the euclidean distance for the sentence "A great game"?
0
votes
1 answer

knnimpute package in python

I am trying to perform a knn imputation using the python package knnimpute. I am kind of lost with what the parameter missing_mask should be. I fail to understand what this means ( from the docs) missing_mask : np.ndarray Boolean array…
Indi
  • 1,401
  • 13
  • 30
0
votes
1 answer

How to successsfully run an ML algorithm with a medium sized data set on a mediocre laptop?

I have a Lenovo IdeaPad laptop with 8 GB RAM and Intel Core I5 processor. I have 60k data points each 100 dimentional. I want to do KNN and for it I am running LMNN algorithm to find a Mahalanobis Metric. Problem is after 2 hours of running a blank…
Fenil
  • 21
  • 3
0
votes
0 answers

Classify image with KNN?

I want to classify an image as a specific letter/number using KNN algorithm. The problem is that the dataset is all png images and I don't know how could I apply this algorithm to this dataset. Should I convert all my dataset and my the image I want…
Leonardo Burrone
  • 339
  • 2
  • 5
  • 13
0
votes
1 answer

Tensorflow KNN : How can we assign the K parameter for defining number of neighbors in KNN?

I have started working on a machine learning project using K-Nearest-Neighbors method on python tensorflow library. I have no experience working with tensorflow tools, so I found some code in github and modified it for my data. My dataset is like…
Masoud Masoumi Moghadam
  • 1,094
  • 3
  • 23
  • 45
0
votes
0 answers

Use custom function on data.table

I would like to ask if it is possible to apply this function to a data.table approach: myfunction <- function(i) { a <- test.dt[i, 1:21, with = F] final <- t((t(b) == a) * value) final[is.na(final)] <- 0 sum.value <- rowSums(final) …
Lëmön
  • 322
  • 2
  • 14
0
votes
1 answer

Retrieving distance matrix from kknn model

Is it possible to retrieve the distance matrix from the kknn model when using mlr package in R and cross validation? library("mlr") data(iris) task = makeClassifTask(data = iris, target = "Species") lnr = makeLearner( cl = "classif.kknn", …
JimBoy
  • 597
  • 8
  • 18
0
votes
1 answer

Caret to predict class with knn: Do I need to provide unknown classes with a random class variable?

I have a tab delimited file with 70 rows of data and 34 columns of characteristics, where the first 60 rows look like this: groups x1 x2 x3 x4 x5 (etc, up to x34) 0 0.1 0.5 0.5 0.4 0.2 1 0.2 0.3 0.8 0.4 0.1 0 …
Slowat_Kela
  • 1,377
  • 2
  • 22
  • 60
0
votes
0 answers

knn in r script in power bi desktop

I have been putting r code concerning knn into the power bi r script. In r studio it was kind of working but in power b it says errors error is cause by last line: # 'dataset' holds the input data for this script normalize <- function(x) { return…
Anna
  • 1
  • 3
0
votes
0 answers

Python 2.7 using nltk built-in dataset to do sentiment analysis for KNN

As I am new to programming, I wish to know that is it possible to use the nltk built-in movie review dataset to do sentiment analysis by using KNN to determine the polarity of data? Is there any way to do so? import nltk from nltk.corpus import…
G.I.JEO
  • 23
  • 1
  • 6
0
votes
1 answer

Training using the custom dataset instead of MNIST

I would like to use a custom dataset that contains image of handwritten characters of a different language other than English. I am planning to use the KNN algorithm of classify the handwritten characters. Here are some of the challenges i am facing…