0

I am trying to generate model using glmnet package in R. I want to do these steps: Randomly split the data into 5 folds.

For each fold:

a. Remove the fold from the data.

b. Use the remaining data to train an elastic-net model using 10-fold cross-validation to tune the lambda parameter.

c. With the trained model, predict on the hold out fold, and like R2 and MSE.

Calculate the average and standard deviation of each of the significance statistics,

d. Train a new elastic-net model using all of the data My datasets are: estimates and demo_info respectively.

dput(estimates_expression)
structure(list(values = c(5.1, 4.9, 4.7, 4.6, 5, 5.4, 4.6, 5, 
4.4, 4.9, 5.4, 4.8, 4.8, 4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 
5.1, 4.6, 5.1, 4.8, 5, 5, 5.2, 5.2, 4.7, 4.8, 5.4, 5.2, 5.5, 
4.9, 5, 5.5, 4.9, 4.4, 5.1, 5, 4.5, 4.4, 5, 5.1, 4.8, 5.1, 4.6, 
5.3, 5, 7, 6.4, 6.9, 5.5, 6.5, 5.7, 6.3, 4.9, 6.6, 5.2, 5, 5.9, 
6, 6.1, 5.6, 6.7, 5.6, 5.8, 6.2, 5.6, 5.9, 6.1, 6.3, 6.1, 6.4, 
6.6, 6.8, 6.7, 6, 5.7, 5.5, 5.5, 5.8, 6, 5.4, 6, 6.7, 6.3, 5.6, 
5.5, 5.5, 6.1, 5.8, 5, 5.6, 5.7, 5.7, 6.2, 5.1, 5.7, 6.3, 5.8, 
7.1, 6.3, 6.5, 7.6, 4.9, 7.3, 6.7, 7.2, 6.5, 6.4, 6.8, 5.7, 5.8, 
6.4, 6.5, 7.7, 7.7, 6, 6.9, 5.6, 7.7, 6.3, 6.7, 7.2, 6.2, 6.1, 
6.4, 7.2, 7.4, 7.9, 6.4, 6.3, 6.1, 7.7, 6.3, 6.4, 6, 6.9, 6.7, 
6.9, 5.8, 6.8, 6.7, 6.7, 6.3, 6.5, 6.2, 5.9)), class = "data.frame", row.names = c(NA, 
-150L))

dput(demographic_info[1:5,1:3])
structure(list(region = c(3.5, 3, 3.2, 3.1, 3.6), date = c(1.4, 
1.4, 1.3, 1.5, 1.4), demo = c(0.2, 0.2, 0.2, 0.2, 0.2)), row.names = c(NA, 
5L), class = "data.frame")

I wrote the code for doing the cross validation in R using glmnet package but I don't know how to first divide the data into 5 fold and for each fold do CV and generate R2 and MSE

set.seed(4)
indexes <- sample(dim(demographic_info)[1],140,replace=FALSE)
x_train <- demographic_info[indexes,] 
y_train <- estimates_expression[indexes,]
x_test <-  demographic_info[-indexes,]
y_test <- estimates_expression[-indexes,]

# Fit model with training data.
set.seed(20)
fit <- cv.glmnet(x_train, y_train, nfolds = n_folds, alpha = 0.5, type.measure='mse')
# Predict test data using model that had minimal mean-squared error in cross validation
y_pred <- predict(fit, x_test, s = 'lambda.min')

My last step of code: I am predicting the model only on testing data. But I want to do it on the entire dataset. Can someone provide guidance on how to do this. Thank you.

rheabedi1
  • 65
  • 7

0 Answers0