I am trying to convert the base R code in Introduction to Statistical Learning into the R tidymodels
ecosystem. The book uses class::knn()
and tidymodels
uses kknn::kknn()
. I got different results when doing knn, with a fixed k. So I stripped out the tidymodels and tried to just compare using class::knn()
and kknn::kknn()
and still I got different results. class::knn
uses Euclidean distance and kknn::kknn
uses Minkowski distance with distance parameter of 2, which is Euclidean distance according to Wikipedia. I set the kernel in kknn to be "rectangular" which according to the documentation is unweighted. Shouldn't the results of knn modeling with a fixed k be the same?
Here is (basically) the base R with class::knn code from the book:
library(ISLR2)
# base R class
train <- (Smarket$Year < 2005)
Smarket.2005 <- Smarket[!train, ]
dim(Smarket.2005)
Direction.2005 <- Smarket$Direction[!train]
train.X <- cbind(Smarket$Lag1, Smarket$Lag2)[train, ]
test.X <- cbind(Smarket$Lag1, Smarket$Lag2)[!train, ]
train.Direction <- Smarket$Direction[train]
the_k <- 3 # 30 shows larger discrepancies
library(class)
knn.pred <- knn(train.X, test.X, train.Direction, k = the_k)
Here is my tidyverse with kknn::kknn code
# tidyverse kknn
library(tidyverse)
Smarket_train <- Smarket %>%
filter(Year != 2005)
Smarket_test <- Smarket %>% # Smarket.2005
filter(Year == 2005)
library(kknn)
the_knn <-
kknn(
Direction ~ Lag1 + Lag2, Smarket_train, Smarket_test, k = the_k,
distance = 2, kernel = "rectangular"
)
fit <- fitted(the_knn)
This shows the differences:
the_k
# class
table(Direction.2005, knn.pred)
# kknn
table(Smarket_test$Direction, fit)
Did I make a stupid mistake in the coding? If not, can anybody explain the differences between class::knn()
and kknn::kknn()
?