3

I posted a question here and was able to reproduce Claus' answer to calculate multiple r-squared values for each species in an additive model using tidyverse on iris data. However, an update occurred for packages and now R-sq values are not being calculated. Not sure why...
Here are clause response and output

library(tidyverse)
library(broom)
iris %>% nest(-Species) %>% 
  mutate(fit = map(data, ~mgcv::gam(Sepal.Width ~ s(Sepal.Length, bs = "cs"), data = .)),
         results = map(fit, glance),
         R.square = map(fit, ~ summary(.)$r.sq)) %>%
  unnest(results) %>%
  select(-data, -fit)

#      Species  R.square       df    logLik      AIC      BIC deviance df.residual
# 1     setosa 0.5363514 2.546009 -1.922197 10.93641 17.71646 3.161460    47.45399
# 2 versicolor 0.2680611 2.563623 -3.879391 14.88603 21.69976 3.418909    47.43638
# 3  virginica 0.1910916 2.278569 -7.895997 22.34913 28.61783 4.014793    47.72143

Yet my code and output produces this with the R.square <dbl [1]> values

library(tidyverse)
library(broom)
iris %>% nest(-Species) %>% 
  mutate(fit = map(data, ~mgcv::gam(Sepal.Width ~ s(Sepal.Length, bs = "cs"), data = .)),
          results = map(fit, glance),
          R.square = map(fit, ~ summary(.)$r.sq)) %>%
   unnest(results) %>%
   select(-data, -fit)

     Species  R.square       df    logLik      AIC      BIC deviance
      <fctr>    <list>    <dbl>     <dbl>    <dbl>    <dbl>    <dbl>
1     setosa <dbl [1]> 2.396547 -1.973593 10.74028 17.23456 3.167966
2 versicolor <dbl [1]> 2.317501 -4.021222 14.67745 21.02058 3.438361
3  virginica <dbl [1]> 2.278569 -7.895997 22.34913 28.61783 4.014793

Can anyone provide insight as to why?

llrs
  • 3,308
  • 35
  • 68
George
  • 1,343
  • 2
  • 12
  • 17
  • I am able to get the first output. What is your package versions? I have `broom_0.4.3`, `dplyr_0.7.4` `purrr_0.2.4` – akrun Feb 07 '18 at 08:33
  • fwiw I get the second output but SessionInfo says ...`broom_0.4.3 `, `dplyr_0.7.4`, `purrr_0.2.4`???? And `mgcv_1.8-23` – Stephen Henderson Feb 07 '18 at 08:44
  • I think it's the `mgcv` version. If I simplify to `mgcv::gam(Sepal.Width ~ s(Sepal.Length, bs = "cs"), data =iris) %>% glance` my result has no R-square. Since I have the same broom as @akrun the `gam` model is maybe differently formatted? – Stephen Henderson Feb 07 '18 at 08:52
  • I have broom_0.4.3, dplyr_0.7.4 purrr_0.2.4 and mgcv_1.8-23 – George Feb 07 '18 at 08:56
  • @akrun which version of mgcv are you running? – George Feb 07 '18 at 09:11
  • I see an error during install of tidyverse `There are binary versions available but the source versions are later: binary source needs_compilation dbplyr 1.1.0 1.2.0 FALSE lubridate 1.7.1 1.7.2 TRUE` – George Feb 07 '18 at 09:13
  • Sorry see my answer. I think the `mgcv` version was a red herring. But I'm still not sure why it works for @Akrun as is.. – Stephen Henderson Feb 07 '18 at 09:14
  • I have `packageVersion("mgcv")# [1] ‘1.8.22’` that explains it – akrun Feb 07 '18 at 09:16
  • @akrun I don't see how this can have anything to do with `mgcv`. `summary(.)$r.sq` is a double, and `map()` puts it into a list, as is documented. However, for some of us it doesn't. (I wrote the original code, and it works for me.) Not sure why, but it would seem to me to be a `purrr` or `tibble` issue. – Claus Wilke Feb 07 '18 at 15:17
  • @ClausWilke You are right that `map` should return a `list`, but with this code for some reason it is not. I double checked and it is the same output as I earlier stated – akrun Feb 07 '18 at 15:21

1 Answers1

5

I have the same sessionInfo as the OP (see comments above). I can fix this by forcing R-squared to be a a double using map_dbl. I'm not totally sure why it works for Akrun as is...?

iris %>% nest(-Species) %>% 
  mutate(fit = map(data, ~mgcv::gam(Sepal.Width ~ s(Sepal.Length, bs = "cs"), data = .)),
         results = map(fit, glance),
         R.square = map_dbl(fit, ~ summary(.)$r.sq)) %>%
  unnest(results) %>%
  select(-data, -fit)

# A tibble: 3 x 8
  Species    R.square    df logLik   AIC   BIC deviance df.residual
  <fct>         <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <dbl>
1 setosa        0.536  2.55  -1.92  10.9  17.7     3.16        47.5
2 versicolor    0.268  2.56  -3.88  14.9  21.7     3.42        47.4
3 virginica     0.191  2.28  -7.90  22.3  28.6     4.01        47.7
Stephen Henderson
  • 6,340
  • 3
  • 27
  • 33
  • I think `map_dbl` is the correct way to do it. Not sure why it works for me with just `map`, it really shouldn't. I've got `dplyr_0.7.4` and `purrr_0.2.4`. – Claus Wilke Feb 07 '18 at 15:15