library(SuperLearner)
library(MASS)
set.seed(23432)
## training set
n <- 500
p <- 50
X <- matrix(rnorm(n*p), nrow = n, ncol = p)
colnames(X) <- paste("X", 1:p, sep="")
X <- data.frame(X)
Y <- X[, 1] + sqrt(abs(X[, 2] * X[, 3])) + X[, 2] - X[, 3] + rnorm(n)
sl_cv = SuperLearner(Y = Y, X = X, family = gaussian(),
SL.library = c("SL.mean", "SL.ranger"),
verbose = TRUE, cvControl = list(V = 5))
In the above code, I'm performing a 5-fold CV to train a SuperLearner. However, what if I want to create my own folds in the data manually? I'm interested in trying this because I know there are clusters in my data, and I would like to perform CV on the folds that I've created.
Take for example that below are the five folds for my toy data: split1
, ..., split5
. Is there a way to use these 5 folds to perform cross-validation on instead of letting SuperLearner
split up the data by itself?
set.seed(1)
index <- sample(1:5, size = nrow(X), replace = TRUE, prob = c(0.2, 0.2, 0.2, 0.2, 0.2))
split1 <- X[index == 1, ]
split2 <- X[index == 2, ]
split3 <- X[index == 3, ]
split4 <- X[index == 4, ]
split5 <- X[index == 5, ]
split1.y <- Y[index == 1]
split2.y <- Y[index == 2]
split3.y <- Y[index == 3]
split4.y <- Y[index == 4]
split5.y <- Y[index == 5]