I'm trying to implement K-nearest neighbors on Iris dataset but after doing the predictions, yhat goes 100% without errors, there must have something wrong and i have no idea what it is...
I created a column named class_id, where i changed:
- setosa = 1.0
- versicolor = 2.0
- virginica = 3.0
that column is type float.
Getting X an Y
x = df[['sepal length', 'sepal width', 'petal length', 'petal width']].values
type(x) shows nparray
y = df['class_id'].values
type(y) shows nparray
Normalizing data
x = preprocessing.StandardScaler().fit(x).transform(x.astype(float))
Creating train and test
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.2, random_state = 42)
Checking best K value:
Ks = 12
for i in range(1,Ks):
k = i
neigh = KNeighborsClassifier(n_neighbors=k).fit(x_train,y_train)
yhat = neigh.predict(x_test)
score = metrics.accuracy_score(y_test,yhat)
print('K: ', k, ' score: ', score, '\n')
Result:
K: 1 score: 0.9666666666666667
K: 2 score: 1.0
K: 3 score: 1.0
K: 4 score: 1.0
K: 5 score: 1.0
K: 6 score: 1.0
K: 7 score: 1.0
K: 8 score: 1.0
K: 9 score: 1.0
K: 10 score: 1.0
K: 11 score: 1.0
Printing y_test and yhat WITH K = 5
print(yhat)
print(y_test)
Result:
yhat: [2. 1. 3. 2. 2. 1. 2. 3. 2. 2. 3. 1. 1. 1. 1. 2. 3. 2. 2. 3. 1. 3. 1. 3. 3. 3. 3. 3. 1. 1.]
y_test: [2. 1. 3. 2. 2. 1. 2. 3. 2. 2. 3. 1. 1. 1. 1. 2. 3. 2. 2. 3. 1. 3. 1. 3. 3. 3. 3. 3. 1. 1.]
all of them shouldn't be 100% correct, there must be something wrong