2

This issue is related to Pipe '.' dot causes trouble in glm call.

purrr:map is wonderful for subgroup analysis and/or model comparison. However, when using glm, the call is messed up and causing issues, e.g. when computing pseudo-R2s. The reason is that update doesn't work with the ugly call, and thus pscl::pR2 cannot compute the log-likelihood of the base model.

pacman::p_load(tidyverse)

#sample data
pacman::p_load(ISLR)
mydata = ISLR::Default

#nest data, students and non-students
Default_nested = Default %>% group_by(student) %>% nest 

#fit glms
formul= default ~income+balance

glms = Default_nested %>% 
  mutate(model=map(data,glm,formula=formul,family='binomial')) 

#pscl::pR2 throwing error
pacman::p_load(pscl)
glms %>% mutate(pr2=map(model,pR2))

Now we can take a look at the first submodel. The call looks strange (formula=..1) even though formula contains the right formula.

> glms$model[[1]]$call
.f(formula = ..1, family = "binomial", data = .x[[i]])
> glms$model[[1]]$formula
default ~ income + balance
> glms$model[[1]]$data
# A tibble: 7,056 x 3
   default balance income
   <fct>     <dbl>  <dbl>
 1 No         730. 44362.

What is the cleanest way to be able to use pscl::pR2 when you have many (more than 2 in this example) glm objects in your tibble?

Edit:

Overview of solution strategies:

(A) "fix" the glm object, so that update can be applied to it:

glms %>% mutate(model = map(model,function(x){x$call = call2("glm",formula=x$formula,data=quote(Default),family='binomial');x})) %>%
  mutate(pr2=map(model,pR2)) %>% unnest(pr2)

This 'runs', however, the computed R2 is off. So this solution strategy is probably a dead-end.

(B) Write a wrapper for `glm, as proposed by Artem. This should work fine. Downside: the calls look ugly.

I expanded on Artem's proposed solution to create glm3.

glm3 <- function(formula,data,family) {  
  eval(rlang::expr( glm(!!rlang::enexpr(data),
                        formula=!!formula,
                        family=!!family ) ))}

glms3 <- Default_nested %>% mutate( model=map(data,glm3,formula=formul,family='binomial'),pr2=map(model,pR2) )
glms3 %>% unnest(pr2)

(C) In this particular case (pseudo R2s), simply write a better pseudo-r2 function. Since it's probably the only major statistic that doesn't work within purrr::map, this may actually make sense. I put together the psr2glm function.

psr2glm=function(glmobj){

  L.base=
    logLik(
      glm(formula = reformulate('1',gsub( " .*$", "", deparse(glmobj$formula) )),
          data=glmobj$data,
          family = glmobj$family))

  n=length(glmobj$residuals)

  L.full=logLik(glmobj)
  D.full <- -2 * L.full
  D.base <- -2 * L.base
  G2 <- -2 * (L.base - L.full)

  return(data.frame(McFadden = 1-L.full/L.base, 
                    CoxSnell = 1 - exp(-G2/n),
                    Nagelkerke = (1 - exp((D.full - D.base)/n))/(1 - exp(-D.base/n))))

}

It works:

glms = Default_nested %>% 
  mutate(model=map(data,glm,formula=formul,family='binomial')) 
glms %>% mutate(pr2=map(model,psr2glm)) %>% unnest(pr2)

I consider proposing changes to DescTools:::PseudoR2, however, I first need to check if the solution is general.

The key to this idea is to skip update and instead directly call glm. All required information pieces are within the glm object, even within purrr::map. Nice side effect of using psr2glm: unnest's output looks nice.

(D) Change either glm or update. Given that the glm object actually contains all necessary information, one could consider the observed behavior a bug. So it should be fixed in base R.

inferator
  • 474
  • 3
  • 12
  • 1
    Check out the (unresolved) discussion [here](https://github.com/tidyverse/purrr/issues/407) – paqmo Aug 02 '19 at 03:00
  • Thanks. It's unresolved and closed. The limitation of the tidy dialect. – inferator Aug 02 '19 at 03:05
  • `glms$model[[1]]$call=glm(1~1,Default,family = 'binomial')$call glms$model[[2]]$call=glm(1~1,Default,family = 'binomial')$call glms %>% mutate(pr2=map(model,pR2)) %>% unnest(pr2)` "works" ... you can put the same call into each nested model ... and you get the results. But this is super hacky. – inferator Aug 02 '19 at 04:05

1 Answers1

2

One way is to define a wrapper for glm() that puts data directly inside the call by manually constructing the expression and then evaluating it:

glm2 <- function(.df, ...) {
  eval(rlang::expr(glm(!!rlang::enexpr(.df),!!!list(...)))) }

glms <- Default_nested %>%
    mutate( model = map(data,glm2,formula=formul,family="binomial"),
            pr2   = map(model,pscl::pR2) )
# # A tibble: 2 x 4
#   student data                 model  pr2      
#   <fct>   <list>               <list> <list>   
# 1 No      <tibble [7,056 × 3]> <glm>  <dbl [6]>
# 2 Yes     <tibble [2,944 × 3]> <glm>  <dbl [6]>

Validation:

## Perform the computation by hand and ensure that it's identical to glms$pr2
glm(Default_nested$data[[1]], formula=default~income+balance, family="binomial") %>%
  pscl::pR2() %>% identical( glms$pr2[[1]] )     # TRUE
glm(Default_nested$data[[2]], formula=default~income+balance, family="binomial") %>%
  pscl::pR2() %>% identical( glms$pr2[[2]] )     # TRUE
Artem Sokolov
  • 13,196
  • 4
  • 43
  • 74
  • Great idea. Just why can't glm2 also quote formula and family? – inferator Aug 19 '19 at 00:12
  • `glm3 <- function(.df,formula,family) { eval(rlang::expr( glm(!!rlang::enexpr(.df), formula=!!formula, family=!!family ) ))} glms3 <- Default_nested %>% mutate( model=map(data,glm3,formula=formul,family='binomial'),pr2=map(model,pR2) ) glms3 %>% unnest(pr2)` this wrapper seems to work – inferator Aug 19 '19 at 00:33
  • @inferator: Yes, that's precisely how one would do it. I modified my answer to show how `...` can be used to cover all additional parameters to `glm()`, including `formula` and `family`. – Artem Sokolov Aug 19 '19 at 19:53
  • Nice. I guess that warrants an acceptance, though I would still appreciate alternative solutions. Wrapping `glm` is just one way to solve this. – inferator Aug 19 '19 at 23:59