How to drop duplicated column names in nested tibbles before unnesting

Question

I have a tibble with a column containing (nested) tibbles. The nested tibbles have duplicated data (same names, same values):

df <- tibble(id = 1:2, data = list(tibble(id = 1, var1 = "a", var2 = "b"), tibble(id = 2, var1 = "c", var2 = "d")))

df

# # A tibble: 2 x 2
#      id data            
#   <int> <list>          
# 1     1 <tibble [1 x 3]>
# 2     2 <tibble [1 x 3]>

Calling df %>% unnest(data) results in

Error: Names must be unique..

I would like to write a function that drops these columns beforehand but don't know how. My goal is to be run to use something along the lines of:

df %>% 
  drop_duplicated_cols(data) %>% 
  unnest(data)

Which would result in:

#> # A tibble: 2 x 4
#>      id var1  var2 
#>   <int> <chr> <chr>
#> 1     1 a     b    
#> 2     2 c     d

deschen · Answer 1 · 2021-09-24T21:03:00.157

You can just use unnest_longer

library(tidyverse)
df %>%
  unnest_longer(data)

which gives:

# A tibble: 2 x 2
     id data$id $var1 $var2
  <int>   <dbl> <chr> <chr>
1     1       1 a     b    
2     2       2 c     d

If you want to truly convert the tibble column, you can also use:

df %>%
  left_join(reduce(df$data, bind_rows), by = "id")

which gives:

# A tibble: 2 x 4
     id data             var1  var2 
  <dbl> <list>           <chr> <chr>
1     1 <tibble [1 x 3]> a     b    
2     2 <tibble [1 x 3]> c     d

And from there you can e.g. unselect the data column.

akrun · Answer 2 · 2021-09-24T21:08:25.560

There is a names_repair argument in unnest. By default, it is "check_unique" and this causes the error. If we change it to another option i.e. "unique", it will make the columns that are duplicated with .. followed by some digits, then we use select to get the columns not duplicated

library(dplyr)
library(tidyr)
library(stringr)
df %>%
    unnest(data, names_repair = "unique") %>%
    select(names(.)[!duplicated(str_remove(names(.), "\\.+.*"))]) %>%
    rename_with(~ str_remove(., "\\.+.*"), contains(".."))

-output

# A tibble: 2 x 3
     id var1  var2 
  <int> <chr> <chr>
1     1 a     b    
2     2 c     d

Or another option to avoid the warnings is to remove the duplicate columns within the list by looping with map

library(purrr)
out <- df %>% 
   mutate(data = map(data, ~ .x %>% 
            select(-any_of(names(df))))) %>%
   unnest(data)
out
# A tibble: 2 x 3
     id var1  var2 
  <int> <chr> <chr>
1     1 a     b    
2     2 c     d    
> str(out)
tibble [2 × 3] (S3: tbl_df/tbl/data.frame)
 $ id  : int [1:2] 1 2
 $ var1: chr [1:2] "a" "c"
 $ var2: chr [1:2] "b" "d"

NOTE: Both solutions give the expected output as in the OP's post along with the structure

score 0 · Answer 3 · answered Sep 25 '21 at 04:49

0

Combine the df$data rows together and join with the original dataframe.

library(dplyr)

left_join(df %>% select(-data), bind_rows(df$data), by = 'id')

#     id var1  var2 
#  <dbl> <chr> <chr>
#1     1 a     b    
#2     2 c     d

answered Sep 25 '21 at 04:49

Ronak Shah

377,200
20
156
213

How to drop duplicated column names in nested tibbles before unnesting

3 Answers3