This is the problem instructions I was given.
Build a K-NN classifier, use 5-fold cross-validation to evaluate its performance based on average accuracy.
Report accuracy measure for k = 2, ..., 10
Write your code below (Hint: you need a loop within a loop, with the outer loop going through each value of k and inner loop going through each fold):
You can manually try k=2,...,10, but try to use a outer loop through each value of k.
I was given 2 for loops. One for Creating folds and the other for calculating k=1:10, which are listed below.
# Given data
library(datasets)
data(iris)
library(dplyr)
normalize = function(x){
return ((x - min(x))/(max(x) - min(x)))}
# normalize
Iris_normalized = IrisData %>% mutate_at(1:4, normalize)
# Create folds
cv = createFolds(y = IrisData$class, k = 5)
accuracy = c()
for (test_rows in cv) {
IrisData_train = IrisData[-test_rows,]
IrisData_test = IrisData[test_rows,]
tree = rpart(class ~ ., data = IrisData_train,
method = "class", parms = list(split = "information"))
pred_tree = predict(tree, IrisData_test, type = "class")
cm = confusionMatrix(pred_tree, IrisData_test[,5])
accuracy = c(accuracy, cm$overall[1])
}
print(mean(accuracy))
# Manual K validation
SSE_curve <- c()
for (k in 1:10) {
print(k)
kcluster = kmeans(utility_normalized, center = k)
sse = kcluster$tot.withinss
print(sse)
SSE_curve[k] = sse
}
So if I am understanding the instructions correctly. I need to:
- Create 5 folds using normalized data with a for loop and set.seed.
- Use a for loop to find the accuracy in k=1:10 for each fold.
I am not sure how these 2 for-loops combine to give me this result in the instructions.