0

dplyr's pipe does not pass the name of objects passed down the chain. This is well known. However, it leads to unexpected complications after you fit a glm model. Functions using glm objects expect the call to contain the correct name of the object containing data.

    #sample data
    p_load(ISLR)
    mydata = ISLR::Default

    #fit glm
    fitted=
    mydata %>% 
      select(default, income) %>%
      glm(default~.,data=.,family=binomial) 

    #dot in call
    fitted$call

    #pscl's pR2 pseudo r2 function does not work
    p_load(pscl)
    pR2(fitted)

How to fix this behavior? I want to keep using pipes, including the select function. I also want to obtained a glm objected in fitted than can be used with pR2 or other function that need a working call.

One can re-arrange the data-preprocessing into the glm call, but it takes away the elegance of the code.

fitted=
  glm(default~.,
      data=mydata %>%
        select(default, income),
      family=binomial) 
inferator
  • 474
  • 3
  • 12
  • 1
    What is your question? – rg255 Jul 06 '19 at 20:18
  • This is because the `.` has a special meaning in a `glm` formula, i.e. to fit the model to all variables contained in the dataset `data` except the dependend variable. – eastclintw00d Jul 06 '19 at 20:46
  • 1
    Yes, this is pretty common problem. You already know the solution, don't pipe into `glm`. Assign to a intermediate object. `mydata_glm <- mydata %>% ..... ; glm(...., data = mydata_glm)` – Axeman Jul 06 '19 at 20:46
  • @eastclintw00d, while that's true, I don't think that's the issue here (the issue is the `.` for data, not the `.` in the formula. – Axeman Jul 06 '19 at 20:47
  • Both `DescTools::PseudoR2()` updates the call via `stats::update`. It does not find the data, looking for `.`. Yes, can have an intermediate objects, but I want something more elegant, and I want code to be as minimal as possible. I think I want too much. – inferator Jul 06 '19 at 20:54
  • 3
    He who lives by the pipe, dies by the pipe. – IRTFM Jul 06 '19 at 21:01

1 Answers1

3

1) Since you are explicitly writing out all the variables in the select anyways you can just as easily write them out in the formula instead and get rid of the select -- you can keep the select if you like but it does seem pointless if the variables are already explicitly given in the formula. Then this works:

library(dplyr)
library(magrittr)
library(pscl)
library(ISLR)

fitted <- Default %$% glm(default ~ income, family=binomial)

fitted %>% pR2

2) Another possibilty is to invert it so that instead of putting glm inside the pipe put the pipe inside glm:

fitted <- 
  glm(default ~ ., data = Default %>% select(income, default), family = binomial)

fitted %>% pR2

3) A third approach is to generate the formula argument of glm rather than the data argument.

fitted <- Default %>% 
  select(starts_with("inc")) %>% 
  names %>% 
  reformulate("default") %>%
  glm(data = Default, family = binomial)

fitted %>% pR2

Replace the glm line with this if it is important that the Call: line in the output look nice.

{ do.call("glm", list(., data = quote(Default), family = quote(binomial))) }

or using purrr:

{ invoke("glm", list(., data = expr(Default), family = expr(binomial))) }
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • (1) `select` can use wildcards (e.g., `starts_with()`, `ends_with()`, `contains()`), or be replaced by `select_if`. It may look pointless in the minimal example, but in general, it isn't. – inferator Jul 06 '19 at 22:35
  • What is helpful though is the pointing out the two approaches: either fixing formula OR data works. – inferator Jul 06 '19 at 22:42
  • Have added (3). – G. Grothendieck Jul 06 '19 at 23:40
  • This is probably the best possible solution given all the constraints. I forgot about `reformulate` -- really useful here. Great job! – inferator Jul 07 '19 at 00:56
  • Related issue, maybe worth a new Question: using map: `Default %>% group_by(student) %>% nest %>% mutate(model=map(data,glm,formula=formul,family='binomial')) %>% mutate(pr2=map(model,pR2))` – inferator Jul 31 '19 at 02:20