0

I'm building my own 1-NN clasifier in python cause I need max speed in certain operations for testing it, because I want to use it in genetic algorithm and every milisecond its important for speed.

I'm trying to implement a leave one out test inside of my KNN class with numpy, but I obtain about 50% of success with this test. I try the scikit learn knn with the same leave one out and returns about 97% of success.

This is my KNN class:

class KNN(object):
"""Documentation for KNK-clasifier"""
def __init__(self):
    super(KNN, self).__init__()
    # self.args = args

def fit(self, entrenamiento, clases):
    self.entrenamiento = np.asarray(entrenamiento)
    self.n_examples = len(self.entrenamiento)
    self.n_features = len(self.entrenamiento[1])
    self.clases = np.asarray(clases)
    self.createDistenceMatrix()

def createDistenceMatrix(self):
    self.distances = np.zeros([len(self.entrenamiento),
                               len(self.entrenamiento),
                               len(self.entrenamiento[1])])
    for i in range(self.n_examples):
        for j in range(self.n_examples):
            if i is not j:
                self.distances[i][j] = self.distance(self.entrenamiento[i],
                                                     self.entrenamiento[j])
            else:
                self.distances[i][j] = np.full(len(self.entrenamiento[1]),
                                               10000.0)

def distance(self, x, y):
    return (x-y)*(x-y)

def predict(self, test, pesos=None):
    dist = 100000
    class_index = 0
    for i in range(self.n_examples):
        aux = self.distance(self.entrenamiento[i], test)
        if pesos is not None:
            aux = pesos*aux

        if aux < dist:
            dist = aux
            class_index = i

    return self.clases[class_index]

def leave_one_out(self, pesos=None):
    # DONE: solo tengo que buscar el minimo de cada columna
    dist = np.zeros(self.n_examples)
    aciertos = 0
    for i in range(self.n_examples):
        for j in range(self.n_examples):
            if pesos is not None:
                dist[i] = np.linalg.norm(
                    np.multiply(self.distances[i][j], pesos))
            else:
                dist[i] = np.linalg.norm(self.distances[i][j])

        if self.clases[i] == self.clases[np.argmin(dist)]:
            aciertos = aciertos + 1

    return 100*(aciertos/self.n_examples)

where create createDistanceMatrix precalculate all the possible x,y distances for all the features and save it to a vector. This vector will be multiplied for a weitght vector. This vector represent a feature weights learning problem that I'm trying to solve. I passed two days trying to find where the mistake is but I cand find, but my clasifier doesn't give me a decent percent of good classification in leave one out.

For sklearn knn this is the leave one out that I'm testing:

    aciertos = 0
    knn = neighbors.KNeighborsClassifier(n_neighbors=1)
    start = time.clock()
    for i in range(len(train)):
        knn.fit(train[1:], cls[1:])
        if knn.predict(train[0])[0] == cls[0]:
            aciertos = aciertos + 1
        train[0], train[-1] = train[-1], train[0]
        cls[0], cls[-1] = cls[-1], cls[0]
    end = time.clock()
    print(str(end - start) + " segundos")
    print(str(100*(aciertos/len(train))))

this same code with my own clasifier returns similar percent of succes.

rafaelleru
  • 367
  • 1
  • 2
  • 19
  • Why would you implement what is already implemented? Use the sklearn one with a fast python interpreter. – Juan Antonio Gomez Moriano Apr 28 '17 at 04:23
  • What do you mean with a fast python interpreter? I was thinking that my implementation of knn will be more fast in my particular test, so how do you suggest to do the leave one out? @JuanAntonioGomezMoriano – rafaelleru Apr 28 '17 at 07:27

1 Answers1

0

I don't know if you fixed your problem but your distance looks wrong?

Here is the algorithm from Stanford cs231n:

enter image description here

Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
Yunus Emre
  • 29
  • 1
  • 6