I tuned a glmnet regression model and extracted the coefficients as described here. That works wonderfully. However, when I use the same form of coefficient extraction for PLSR with mixOmics engine, I obtain single values per term and component as demonstrated here. For further external use I need the coefficients of PLSR in the first form. I can achieve this by using the optimal hyperparamterset with the plsr()
function from the pls package and then extracting it with coef()
as shown at the end of the code below. However, I would like to avoid this extra step because I cannot pass parameters like predictor_prop
to plsr
and thus results may vary.
Is there a more elegant way to extract the overall model coefficients of the PLSR as for glmnet or can I calculate them from the component values?
library(tidymodels)
library(plsmod)
data(Chicago)
Chicago <- Chicago %>% select(ridership, Clark_Lake, Austin, Harlem)
# create cross-validation dataset
folds <- vfold_cv(Chicago)
# create recipe
rec <- recipe(ridership ~ ., Chicago) %>%
step_normalize(all_predictors()) %>%
prep(training = Chicago)
# define model
mod <- parsnip::pls(mode = "regression",
num_comp = tune(),
predictor_prop = tune()) %>%
set_engine("mixOmics")
# define workflow
wf <- workflow() %>%
add_recipe(rec) %>%
add_model(mod)
# run grid tuning
set.seed(123)
res <- tune_grid(wf, resamples = folds, grid = 5)
# get best model
res_best <- res %>% select_best("rmse")
# fit best model and extract coefficients
wf %>%
finalize_workflow(res_best) %>%
fit(Chicago) %>%
extract_fit_parsnip() %>%
tidy()
# extracting coefficients using plsr from pls package and coef function
p <- pls::plsr(ridership ~ ., data = Chicago, scale = T, center = T, ncomp = 3)
coef(p, intercept = T)
Thank you for the awesome tidymodels framework and everyone who makes it what it is!