I am currently taking an online Data science: Machine learning course and we are asked to fit a lm 100 times and obtain the values of the mean (rmse) and sd(rmse) for data sets of different sizes n=c(100,500,1000,5000,10000). we are asked to create a function that takes the size n and builds the dataset, then runs the loop made for fitting the 100 models, then set the seed and use a map() or sapply() function for applying our new function to the n different sizes.
The code I did is showing me "Error in dat$y : $ operator is invalid for atomic vectors" error when I run f1 This is my code:
library(MASS)
library(caret)
ff=function(n){
Sigma <- 9*matrix(c(1.0, 0.5, 0.5, 1.0), 2, 2)
dat <- MASS::mvrnorm(n, c(69, 69), Sigma)%>%data.frame() %>% setNames(c("x", "y"))
}
set.seed(1,sample.kind = "Rounding")
n=c(100,500,1000,5000,10000)
f1=map(n,function(dat){
rmse=replicate(100,{
y <- dat$y
test_index <- createDataPartition(y, times = 1, p = 0.5, list = FALSE)
train_set <- dat %>% slice(-test_index)
test_set <- dat %>% slice(test_index)
fit <- lm(y ~ x, data = train_set)
y_hat <- fit$coef[1] + fit$coef[2]*test_set$x
sqrt(mean((y_hat - test_set$y)^2))
})
structure(c(mean(rmse),sd(rmse)))
})
Thank you for your help!!