1

I am calculating summary statistics based on nested dataframes, and I am adding the statistics in unnested columns. Here is a toy example:

library(purrr)
library(dplyr)
library(tidyr)

df <- tribble(
  ~data,
  tibble(pop = c(1, 2, 3)), 
  tibble(pop = c(4, 5, 6))
)

df2 <- df %>% mutate(median_pop = map(.x = data, ~ .x %>%
                                  summarise(median(pop)))) %>%
  unnest(median_pop)  %>% 
  rename(median_pop = `median(pop)`)

This yields the desired result, but it requires the rename function call in the last line to regenerate the median_pop column name created in the mutate function call. It's clear to me that unnest is somehow eliminating the median_pop column name, but it's not clear to me why that's the case or how to prevent it. It seems possible that either the names_repair or the names_sep arguments to unnest might address the problem, but I do not understand how. Is it possible to retain column names when unnesting?

nicholas
  • 903
  • 2
  • 12

1 Answers1

2

The naming is actually depended on the summarise call in map and not the mutate call outside. The mutate call can take any name.

library(dplyr)
library(tidyr)
library(purrr)

df %>% 
  mutate(name_anything = map(.x = data, ~ .x %>% summarise(median_pop = median(pop)))) %>%
  unnest(name_anything)

#  data             median_pop
#  <list>                <dbl>
#1 <tibble [3 × 1]>          2
#2 <tibble [3 × 1]>          5
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213