2

I'm a bit confused with how I should interpret the coefficients from the elastic net model that I'm getting through tidymodels and glmnet. Ideally, I'd like to produce unscaled coefficients for maximum interpretability.

My issue is that I'm honestly not sure how to unscale the coefficients that the model is yielding because I can't quite figure out what's being done in the first place.

It's a bit tricky for me to post the data one would need to reproduce my results, but here's my code:

library(tidymodels)
library(tidyverse)

# preps data for model
myrecipe <- mydata %>%
  recipe(transactionrevenue ~ sessions + channelgrouping + month + new_user_pct + is_weekend) %>%
  step_novel(all_nominal(), -all_outcomes()) %>%
  step_dummy(month, channelgrouping, one_hot = TRUE) %>%
  step_zv(all_predictors()) %>%
  step_normalize(sessions, new_user_pct) %>%
  step_interact(terms = ~ sessions:starts_with("channelgrouping") + new_user_pct:starts_with("channelgrouping"))
  
# creates the model
mymodel <- linear_reg(penalty = 10, mixture = 0.2) %>%
  set_engine("glmnet", standardize = FALSE)

wf <- workflow() %>%
  add_recipe(myrecipe)

model_fit <- wf %>%
  add_model(mymodel) %>%
  fit(data = mydata)
  
# posts coefficients
tidy(model_fit)

If it would help, here's some information that might be useful:

The variable that I'm really focusing on is "sessions." In the model, the coefficient for sessions is 2543.094882, and the intercept is 1963.369782. The penalty is also 10.

The unscaled mean for sessions is 725.2884 and the standard deviation is 1035.381.

I just can't seem to figure out what units the coefficients are in and how/if it's even possible to unscale the coefficients back to the original units.

Any insight would be very much appreciated.

Evan O.
  • 1,553
  • 2
  • 11
  • 20

1 Answers1

2

You can use tidy() on a lot of different components of a workflow. The default is to the tidy() the model but you can also get out the recipe and even recipe steps. This is where the information it sounds like you are interested in is.

library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip
data(bivariate)

biv_rec <- 
   recipe(Class ~ ., data = bivariate_train) %>%
   step_BoxCox(all_predictors())%>%
   step_normalize(all_predictors())

svm_spec <- svm_linear(mode = "classification")
biv_fit <- workflow(biv_rec, svm_spec) %>% fit(bivariate_train)

## tidy the *model*
tidy(biv_fit)
#> # A tibble: 3 × 2
#>   term  estimate
#>   <chr>    <dbl>
#> 1 A       -1.15 
#> 2 B        1.17 
#> 3 Bias     0.328

## tidy the *recipe*
extract_recipe(biv_fit) %>%
   tidy()
#> # A tibble: 2 × 6
#>   number operation type      trained skip  id             
#>    <int> <chr>     <chr>     <lgl>   <lgl> <chr>          
#> 1      1 step      BoxCox    TRUE    FALSE BoxCox_ZRpI2   
#> 2      2 step      normalize TRUE    FALSE normalize_DGmtN

## tidy the *recipe step*
extract_recipe(biv_fit) %>%
   tidy(number = 1)
#> # A tibble: 2 × 3
#>   terms  value id          
#>   <chr>  <dbl> <chr>       
#> 1 A     -0.857 BoxCox_ZRpI2
#> 2 B     -1.09  BoxCox_ZRpI2

## tidy the other *recipe step*
extract_recipe(biv_fit) %>%
   tidy(number = 2)
#> # A tibble: 4 × 4
#>   terms statistic   value id             
#>   <chr> <chr>       <dbl> <chr>          
#> 1 A     mean      1.16    normalize_DGmtN
#> 2 B     mean      0.909   normalize_DGmtN
#> 3 A     sd        0.00105 normalize_DGmtN
#> 4 B     sd        0.00260 normalize_DGmtN

Created on 2021-08-05 by the reprex package (v2.0.0)

You can read more about tidying a recipe here.

Julia Silge
  • 10,848
  • 2
  • 40
  • 48