I am a student and currently doing some testing for my module.
For X, I have defined 4 features (Battery, twosim (converted the value Yes = 1 and No = 0), talktime, phonecore). For Y, I have the costRange (very expensive, expensive, cheap, very cheap).
In total, I have 2000 lines for X and Y.
I am trying to use KNN (splitting 70 x 30) 70 training 30 test for predicting the Y.
First, I have converted Y to 1 hot encoding.
Y = df['costrange']
Ycoded = pd.get_dummies(Y, prefix='cr')
Next, I split to test and training set
X_train_scaled, X_test_scaled, y_train, y_test = train_test_split(X, Ycoded, test_size = 0.30)
Next, I do some scalering for X using MinMax Scaler
scaler = preprocessing.MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test)
After that, I start the knn with n=4 weight uniform
knn = KNeighborsClassifier(n_neighbors=4, weights='uniform')
knn.fit(X_train_scaled,y_train)
y_pred=knn.predict(X_test_scaled)
Lastly, to see my accuracy and other metrics
print(metrics.accuracy_score(y_test, y_pred))
print(knn.score(X_test_scaled,y_test))
Matrix = confusion_matrix(y_test, y_pred)
print(matrix)
My accuracy is only 13%.
I am not able to print the matrix as well:
Can anyone tell me what went wrong?