using KNN for one hot encoding variable Y

Question

I am a student and currently doing some testing for my module.

For X, I have defined 4 features (Battery, twosim (converted the value Yes = 1 and No = 0), talktime, phonecore). For Y, I have the costRange (very expensive, expensive, cheap, very cheap).

In total, I have 2000 lines for X and Y.

I am trying to use KNN (splitting 70 x 30) 70 training 30 test for predicting the Y.

First, I have converted Y to 1 hot encoding.

Y = df['costrange']
Ycoded = pd.get_dummies(Y, prefix='cr')

Next, I split to test and training set

X_train_scaled, X_test_scaled, y_train, y_test = train_test_split(X, Ycoded, test_size = 0.30)

Next, I do some scalering for X using MinMax Scaler

scaler = preprocessing.MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test)

After that, I start the knn with n=4 weight uniform

knn = KNeighborsClassifier(n_neighbors=4, weights='uniform')
knn.fit(X_train_scaled,y_train)
y_pred=knn.predict(X_test_scaled)

Lastly, to see my accuracy and other metrics

print(metrics.accuracy_score(y_test, y_pred))
print(knn.score(X_test_scaled,y_test))

Matrix = confusion_matrix(y_test, y_pred)
print(matrix)

My accuracy is only 13%.

I am not able to print the matrix as well:

Matrix Error

Can anyone tell me what went wrong?

score 0 · Answer 1 · answered Dec 29 '22 at 07:23

This thread has the information for your matrix error.

As for the accuracy, you can first play around with your arguments and change the weights type or the number of neighbours. You can also try other techniques, I generally prefer to use svm. Also, not every data is predictable, so you might also be interested in checking that for your data by running a chi-squared analysis or other feature selection techniques

using KNN for one hot encoding variable Y

1 Answers1