2

I'm running into a problem with purrr::map and broom::tidy when running loglinear GLMs in R. For some reason, model p-values do not print when running many models but do print with a single model. In the end, I'd like the multiple models to print p values for each model as it does in the single model case. The provided example uses the built in "Titanic" data set (see William King's website).

data(Titanic)

#convert to data frame
T.df <- as.data.frame(Titanic)

head(T.df)

#run glm as loglinear model
model1 <- glm(Freq ~ Sex * Survived, family = poisson, data = T.df)

#print model with tidy--p-values print here
broom::tidy(anova(model1, test = "Chisq"))

#Now run multiple models by class
#Note the models print just fine but without p values
T.df %>%
 tidyr::nest(-Class) %>%
  dplyr::mutate(
    fit = purrr::map(data, ~ anova(glm(Freq ~ Sex * Survived, family = poisson, data = .x)), test="Chisq"),
    tidied = purrr::map(fit, broom::tidy)
  ) %>%
  tidyr::unnest(tidied)

While I'm thinking about it, how does one stop broom::tidy from printing the warning messages about unrecognized columns?

Thanks in advance.

Joe
  • 121
  • 1
  • 8
  • Have you looked at the [broom and dplyr](https://cran.r-project.org/web/packages/broom/vignettes/broom_and_dplyr.html) vignette? Might be a little out of date since they are encouraging moving away from `do`, but should still work – Calum You Mar 19 '19 at 20:55

1 Answers1

1

The issue is in the displaced parens for anova, The test = "Chisq" is wrapped outside the anova call i.e

anova(glm(Freq ~ Sex * Survived, family = poisson, data = .x)), test="Chisq")
                                                           ^^^

Implementing with the correct closing parens

T.df %>%
  nest(-Class) %>%
  mutate(tidied = map(data, ~ 
     glm(Freq ~ Sex * Survived, family = poisson, data = .x) %>% 
     anova(., test = "Chisq") %>% 
     broom::tidy(.))) %>% 
  unnest(tidied)
# A tibble: 16 x 7
#   Class term            df Deviance Resid..Df Resid..Dev    p.value
#   <fct> <chr>        <int>    <dbl>     <int>      <dbl>      <dbl>
# 1 1st   NULL            NA    NA            7       590. NA        
# 2 1st   Sex              1     3.78         6       586.  5.20e-  2
# 3 1st   Survived         1    20.4          5       566.  6.28e-  6
# 4 1st   Sex:Survived     1   162.           4       404.  4.78e- 37
# 5 2nd   NULL            NA    NA            7       476. NA        
# 6 2nd   Sex              1    18.9          6       457.  1.37e-  5
# 7 2nd   Survived         1     8.47         5       449.  3.62e-  3
# 8 2nd   Sex:Survived     1   163.           4       286.  2.54e- 37
# 9 3rd   NULL            NA    NA            7       876. NA        
#10 3rd   Sex              1   145.           6       732.  2.54e- 33
#11 3rd   Survived         1   181.           5       550.  2.36e- 41
#12 3rd   Sex:Survived     1    57.8          4       493.  2.92e- 14
#13 Crew  NULL            NA    NA            7      2535. NA        
#14 Crew  Sex              1  1014.           6      1522.  2.02e-222
#15 Crew  Survived         1   252.           5      1269.  7.85e- 57
#16 Crew  Sex:Survived     1    42.4          4      1227.  7.63e- 11
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thanks so much! A simple fix and more efficient code. Piping the model output to the anova call works so well. – Joe Mar 19 '19 at 20:56