0

Im pretty much a beginner, but Im stumped and I feel like I shouldnt be.

I am learning about the kNN algorithm using numpy. The code is the following:

    def kNN_classify(query, dataset, labels, k):

dataSetSize = dataset.shape[0]
diffMat = tile(query, (dataSetSize,1)) - dataset
sqDiffMat = diffMat**2
sqDistances = sqDiffMat.sum(axis=1)
distances = sqDistances**0.5
sortedIndices = distances.argsort()
classCount = dict()
for i in range(k):
    voteIlabel = labels[sortedIndices[i]]
    classCount[voteIlabel] = classCount.get(voteIlabel, 0) + 1
sortedClassCount = sorted(classCount.items(), key = 
operator.itemgetter(1), reverse=True)
return sortedClassCount[0][0]

I first use basic pythagoras to get an array of the distances from the query point. In this array the distances are ordered like the points in the original data, same as the labels. Then I use argsort to get an array which gives me the sorting order of the distances. Then I iterate over this array and vote for the label, which has the index of the kth element of distances.argsort.

I understand that it works, I have tested it but I cannot grasp how. What am I missing?

BioHazZzZard
  • 121
  • 3

1 Answers1

0

Please try this. Have a look at this part of the code sqDiffMat = diffMat**2. Then, you probably are in python3 so, use classCount.items() instead of classCount.iteritems(). iteritems() was removed in python3.

def classify0(inX, dataSet, labels, k):
dataSetSize = dataSet.shape[0]
diffMat = tile(inX, (dataSetSize,1)) - dataSet
sqDiffMat = diffMat**2
sqDistances = sqDiffMat.sum(axis=1)
distances = sqDistances**0.5
sortedDistIndicies = distances.argsort()     
classCount={}          
for i in range(k):
    voteIlabel = labels[sortedDistIndicies[i]]
    classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1
sortedClassCount = sorted(classCount.items(), 
                          key=operator.itemgetter(1), 
                          reverse=True)
return sortedClassCount[0][0]
Armand
  • 1