Large standard error of prediction from parsnip vs base R

Question

It seems like predict is producing a standard error that is too large. I get 0.820 with a parsnip model but 0.194 with a base R model. 0.194 for a standard error seems more reasonable since about 2*0.195 above and below my prediction are the ends of the confidence interval. What is my problem/misunderstanding?

library(parsnip)
library(dplyr)

# example data
mod_dat <- mtcars %>%
  as_tibble() %>%
  mutate(cyl_8 = as.numeric(cyl == 8)) %>%
  select(mpg, cyl_8)

parsnip_mod <- logistic_reg() %>%
  set_engine("glm") %>%
  fit(as.factor(cyl_8) ~ mpg, data = mod_dat)

base_mod <- glm(as.factor(cyl_8) ~ mpg, data = mod_dat, family = "binomial")

parsnip_pred <- tibble(mpg = 18) %>%
  bind_cols(predict(parsnip_mod, new_data = ., type = 'prob'),
            predict(parsnip_mod, new_data = ., type = 'conf_int', std_error = T)) %>%
  select(!ends_with("_0"))

base_pred <- predict(base_mod, tibble(mpg = 18), se.fit = T, type = "response") %>%
  unlist()

# these give the same prediction but different SE
parsnip_pred
#> # A tibble: 1 x 5
#>     mpg .pred_1 .pred_lower_1 .pred_upper_1 .std_error
#>   <dbl>   <dbl>         <dbl>         <dbl>      <dbl>
#> 1    18   0.614         0.230         0.895      0.820
base_pred
#>          fit.1       se.fit.1 residual.scale 
#>      0.6140551      0.1942435      1.0000000

^{Created on 2020-06-04 by the reprex package (v0.3.0)}

--EDIT--

As @thelatemail and @Limey said, using type="link" for the base model will give the standard error on the logit scale (0.820). However, I want the standard error on the probability scale. Is there an option in the parsnip documentation that I'm missing? I would like to use parsnip.

I think keeping the so post is also fine for this specific case, he clearly knows what he is asking about, there is a reprex, and the question is intriguing, also how will the guys at cross validated answer a package specific question? You can try asking on the github repo of parnsnip — Bruno, Jun 05 '20 at 05:04
You've asked for `type = "response"` explicitly which is not the default. I can get the `0.82` result using `predict(base_mod, data.frame(mpg=18), se.fit=TRUE, type="link")` - check out `?predict.glm` to see what the different `types` return. — thelatemail, Jun 05 '20 at 05:31
@thelatemail I want the standard error on the response/probability scale so that's why I used `type = "response"`. How do I get the result from the `parsnip` model to give the standard error on the same scale? See edits — Levi Baguley, Jun 09 '20 at 04:17
@LeviBaguley - you can force the output you want by using the underlying predict.glm function via `predict(parsnip_mod, new_data =tibble(mpg=18), type="raw", opts=list(se.fit=TRUE, type="response"))`, but that seems to be overcomplicated compared to using `predict.glm` directly. — thelatemail, Jun 09 '20 at 05:06

score 1 · Accepted Answer · answered Jun 05 '20 at 07:25

1

@thelatemail is correct. From the online doc for predict.glm:

type
the type of prediction required. The default is on the scale of the linear predictors; the alternative "response" is on the scale of the response variable. Thus for a default binomial model the default predictions are of log-odds (probabilities on logit scale) and type = "response" gives the predicted probabilities.

The default is to report using the logit scale,, 'response' requests results on the raw probability scale. It's not obvious from the parsnip::predict documentation that I found how that chooses the scale on which to return its results, but it's clear it's using the raw probability scale.

So both methods are returning correct answers, they're just using different scales.

I don't want to steal an accepted solution from @thelatemail, so invite them to post a similar answer to this.

answered Jun 05 '20 at 07:25

Limey

10,234
2
12
32

All good - i didn't have time to write up a proper answer when I posted, so have upvoted. – thelatemail Jun 06 '20 at 00:06
That makes sense! I guess my question now is how to get `.std_error` to be on the same scale as the three `.pred_*` columns. I'll edit my question. – Levi Baguley Jun 09 '20 at 03:32
@Limey is correct in their analysis. For your question though... to get the standard error on the probability scale, you would have to use something like the [Delta method](https://en.wikipedia.org/wiki/Delta_method) to do the computation. However, this is going to work very poorly as you get closer to zero or one. I don't think that it is possible to get what you want with any realistic precision. You _could_ get something very similar with a Bayesian model though (using the "stan" engine). Take a look at `rstanarm::stan_glm()` – topepo Jun 11 '20 at 22:08

Levi Baguley · Answer 2 · 2020-06-12T02:39:59.927

As @thelatemail said, you can get the standard error on the probability scale with parsnip using the arguments: type="raw", opts=list(se.fit=TRUE, type="response"). But at that point, you might as well use a base model since the output is exactly the same. However, this is still useful if you are already using a parsnip model and you want the standard error output of a base model.

library(parsnip)
library(dplyr)

mod_dat <- mtcars %>%
  as_tibble() %>%
  mutate(cyl_8 = as.numeric(cyl == 8)) %>%
  select(mpg, cyl_8)

parsnip_mod <- logistic_reg() %>%
  set_engine("glm") %>%
  fit(as.factor(cyl_8) ~ mpg, data = mod_dat)

base_mod <- glm(as.factor(cyl_8) ~ mpg, data = mod_dat, family = "binomial")

predict(parsnip_mod, tibble(mpg = 18), type="raw",
        opts=list(se.fit=TRUE, type="response")) %>% 
  as_tibble()
#> # A tibble: 1 x 3
#>     fit se.fit residual.scale
#>   <dbl>  <dbl>          <dbl>
#> 1 0.614  0.194              1

predict.glm(base_mod, tibble(mpg = 18), se.fit = T, type="response") %>% 
  as_tibble()
#> # A tibble: 1 x 3
#>     fit se.fit residual.scale
#>   <dbl>  <dbl>          <dbl>
#> 1 0.614  0.194              1

^{Created on 2020-06-11 by the reprex package (v0.3.0)}

Large standard error of prediction from parsnip vs base R

2 Answers2