I am trying to do a lasso regression for a binary classification task in mlr3 using the learner lrn("classif.cv_glmnet").
My goal is to train this learner on the final task and access the model including the chosen lamda value (determined by cv) and fitted β coefficients.
As can bee seen in the code below my problem is that I cannot access the "final final" model, as I only get the β coefficients for different values. Of course I could manually inspect the cv output, search for .1se and then look which β coefficients corrrespond to this value but I guess there is a more straight forward way?
I am not sure how to ask my question more precise, thus I have recreated my problem with the penguins task and put the questions in the code below.
I would be very happy if someone could help me!
Thanks in advance :)
task = tsk("penguins")
learner_lasso = lrn("classif.cv_glmnet", predict_type = "prob")
# create graph learner with imputation (median for numeric features, mode for categorical features) and encoding because lasso cannot handle factor features and missing values (can be ignored to answer the question!)
glrn_lasso = as_learner(po("imputemedian", param_vals = list(affect_columns = selector_type(c("numeric", "integer"))))
%>>% po("imputemode", param_vals = list(affect_columns = selector_type(c("factor", "character"))))
%>>% po("encode", method = "one-hot") %>>% learner_lasso)
# train learner on task
glrn_lasso$train(task)
# access trained model
glrn_lasso$model
# lamda min and lamda 1se determined by 10-fold CV
glrn_lasso$model$classif.cv_glmnet$model
# fitted lasso regression model
glrn_lasso$model$classif.cv_glmnet$model$glmnet.fit
# QUESTION 1: can someone explain to me what this output exactly means (especially how to interpret the %dev for the different lambas, I guess df is the number of nonzero beta coefficients when choosing the lambda on the right?)
# give the beta coefficients for the fitted model
glrn_lasso$model$classif.cv_glmnet$model$glmnet.fit$beta
# QUESTION 2: this output shows the beta coeficients for different lamdas (e.g. s0, s1, ...). However I only want the coefficients for the final model fitted (I guess this is the one with s = lamda.1se chosen by CV before)?
# How can I access this final model without manually choosing s = lamda.1se?
# evaluate performance with resampling
# QUESTION 3: isn't this nested resampling, as cv to findet the best lamda is performed within the resampling process?
rr = resample(
task = task,
learner = glrn_lasso,
resampling = rsmp("cv", folds = 3),
store_models = TRUE
)
# estimated performance of the lasso model trained on the whole dataset
rr$aggregate()
# access the learners created during resampling (which deviate from the models fitted for the final model)
rr$learners[[1]]$model # model of the first resampling fold
# now I want to do the same as above with the whole model: look at the beta coefficients that were actually chosen for the performance evaluation of the fitted model
rr$learners[[1]]$model$classif.cv_glmnet$model$glmnet.fit$beta
# however: same here again: I do not want all betas but only those that were chosen in the final model (whithout manually looking for the s that corresponds to the cv result (e.g. lamda.1se) of the respective resampling fold)