4

I am using the code from this question (below) to save columns of nested tibble into a new list of tibbles (each column being a tibble in the list). However, when using selected on the nested tibble, the nested variable is lost. Which I'd like to retain, it keeps the grouping variable with the results.

e.g., results %>% unnest(tidied) keeps "carb", but 'results %>% select(tidied) %>% map(~bind_rows(.))' does not.

How can I keep the nested variable with the selected columns?

library(tidyverse)
library(broom)
data(mtcars)
df <- mtcars

nest.df <- df %>% nest(-carb) 

results <- nest.df %>% 
  mutate(fit = map(data, ~ lm(mpg ~ wt, data=.x)),
         tidied = map(fit, tidy),
         glanced = map(fit, glance),
         augmented = map(fit, augment))

final <- results %>% select(glanced, tidied, augmented ) %>% 
        map(~bind_rows(.))
nofunsally
  • 2,051
  • 6
  • 35
  • 53

1 Answers1

2

We can do a mutate_at before the select step (not clear about the expected output though). Here mutate_at in looping through each column, but these columns are also tibble, so inside the function (list(~), we use map2 to pass the column and the 'carb' column, then create a new column with the list tibble column by mutateing with new column 'carb'

results %>%
  mutate_at(vars(glanced, tidied, augmented), 
          list(~ map2(.,carb, ~ .x %>% mutate(carb = .y)))) %>% 
  select(glanced, tidied, augmented) %>% 
  map(~ bind_rows(.x))
$glanced
# A tibble: 6 x 12
#  r.squared adj.r.squared  sigma statistic   p.value    df logLik    AIC    BIC deviance df.residual  carb
#      <dbl>         <dbl>  <dbl>     <dbl>     <dbl> <int>  <dbl>  <dbl>  <dbl>    <dbl>       <int> <dbl>
#1   0.696           0.658   2.29  18.3      0.00270      2 -21.4    48.7   49.6    41.9            8     4
#2   0.654           0.585   3.87   9.44     0.0277       2 -18.2    42.4   42.3    74.8            5     1
#3   0.802           0.777   2.59  32.3      0.000462     2 -22.6    51.1   52.1    53.5            8     2
#4   0.00295        -0.994   1.49   0.00296  0.965        2  -3.80   13.6   10.9     2.21           1     3
#5   0               0     NaN     NA       NA            1 Inf    -Inf   -Inf       0              0     6
#6   0               0     NaN     NA       NA            1 Inf    -Inf   -Inf       0              0     8

#$tidied
# A tibble: 10 x 6
#   term        estimate std.error statistic      p.value  carb
#   <chr>          <dbl>     <dbl>     <dbl>        <dbl> <dbl>
# 1 (Intercept)   27.9       2.91     9.56     0.0000118      4
# 2 wt            -3.10      0.724   -4.28     0.00270        4
#...
#...
akrun
  • 874,273
  • 37
  • 540
  • 662
  • This works. If possible, I'd appreciate a bit about what is happening here `mutate_at(vars(glanced, tidied, augmented), list(~ map2(.,carb, ~ .x %>% mutate(carb = .y))))` Does `mutate_at` use those variables in the proceeding function `list(~ map2(.,carb, ~ .x %>% mutate(carb = .y)))) `? If so, maybe unpacking a bit of `list(~ map2(.,carb, ~ .x %>% mutate(carb = .y)))) ` would be helpful. – nofunsally Jun 20 '19 at 16:46
  • 1
    @nofunsally I added some explanations in the post. Here, your data columns is a `list` of `tibble`s. So, the `map2` does is loop through the list element one by one, then as each indiividual element is a `tibble`, you can use all the functions/methods for normal data.frame/tibble i.e `summarise/mutate` etc. The key here is that we are passing 2 arguments to map2, one of them is a list, second is a vector, but both of them have the same unit length i.e. 1. So, it loops through each corresponding element and create the 'carb' column corresponding to the value passed in map2 – akrun Jun 20 '19 at 16:57