how to "spread" a list-column?

Question

Consider this simple example

mydf <- data_frame(regular_col = c(1,2),
                   normal_col = c('a','b'),
                   weird_col = list(list('hakuna', 'matata'),
                                 list('squash', 'banana')))

> mydf
# A tibble: 2 x 3
  regular_col normal_col weird_col 
        <dbl> <chr>      <list>    
1           1 a          <list [2]>
2           2 b          <list [2]>

I would like to extract the elements of weird_col (programmatically, the number of elements may change) so that each element is placed on a different column. That is, I expect the following output

> data_frame(regular_col = c(1,2),
+           normal_col = c('a','b'),
+           weirdo_one = c('hakuna', 'squash'),
+           weirdo_two = c('matata', 'banana'))
# A tibble: 2 x 4
  regular_col normal_col weirdo_one weirdo_two
        <dbl> <chr>      <chr>      <chr>     
1           1 a          hakuna     matata
2           2 b          squash     banana

However, I am unable to do so in simple terms. For instance, using the classic unnest fails here, as it expands the dataframe instead of placing each element of the list in a different column.

> mydf %>% unnest(weird_col)
# A tibble: 4 x 3
  regular_col normal_col weird_col
        <dbl> <chr>      <list>   
1           1 a          <chr [1]>
2           1 a          <chr [1]>
3           2 b          <chr [1]>
4           2 b          <chr [1]>

Is there any solution in the tidyverse for that?

`mydf%>%group_by(regular_col)%>%mutate(weird_col = invoke(paste,weird_col,collapse=","))%>%separate(weird_col,c("col1","col2"))` — Onyambu, Aug 12 '18 at 23:05
@Onyambu pretty cool as well. what is the purpose of `invoke` here? — ℕʘʘḆḽḘ, Aug 12 '18 at 23:36
`invoke` is similar to `do.call` (and it's a simple wrapper round it, if you look at the code), the main difference is that it has an additional `...` argument, that @Onyambu uses here to specify `collapse=","` — moodymudskipper, Aug 13 '18 at 13:07

score 11 · Accepted Answer · answered Aug 12 '18 at 22:50

You can extract the values from the output of unnest, process a little to make your column names, and then spread back out. Note that I use flatten_chr because of your depth-one list-column, but if it is nested you can use flatten and spread works just as well on list-cols.

library(tidyverse)
#> Warning: package 'dplyr' was built under R version 3.5.1
mydf <- data_frame(
  regular_col = c(1, 2),
  normal_col = c("a", "b"),
  weird_col = list(
    list("hakuna", "matata"),
    list("squash", "banana")
  )
)
mydf %>%
  unnest(weird_col) %>%
  group_by(regular_col, normal_col) %>%
  mutate(
    weird_col = flatten_chr(weird_col),
    weird_colname = str_c("weirdo_", row_number())
    ) %>% # or just as.character
  spread(weird_colname, weird_col)
#> # A tibble: 2 x 4
#> # Groups:   regular_col, normal_col [2]
#>   regular_col normal_col weirdo_1 weirdo_2
#>         <dbl> <chr>      <chr>    <chr>   
#> 1           1 a          hakuna   matata  
#> 2           2 b          squash   banana

Created on 2018-08-12 by the reprex package (v0.2.0).

moodymudskipper · Answer 2 · 2018-08-13T13:06:29.070

5

unnest develops lists and vectors vertically, and one row data frames horizontally. So what we can do is change your lists into data frames (with adequate column names) and unnest afterwards.

mydf %>% mutate(weird_col = map(weird_col,~ as_data_frame(
  setNames(.,paste0("weirdo_",1:length(.)))
  ))) %>% 
  unnest

# # A tibble: 2 x 4
#   regular_col normal_col weirdo_1 weirdo_2
#         <dbl>      <chr>    <chr>    <chr>
# 1           1          a   hakuna   matata
# 2           2          b   squash   banana

edited Aug 13 '18 at 13:06

answered Aug 13 '18 at 13:00

moodymudskipper

46,417
11
121
167

1

pretty nice and concise! – ℕʘʘḆḽḘ Aug 13 '18 at 13:06
It's similar to what @Onyambu suggested in the comments, except that he first transforms the list into a comma separated string, that will be spread horizontally with `separate` – moodymudskipper Aug 13 '18 at 13:12
would this solution work if the number of elements in the list could vary in different rows? – ℕʘʘḆḽḘ Aug 13 '18 at 13:13
1

yes it would, try removing `'matata'` from the first list, you would get`NA` in column `weirdo_2` – moodymudskipper Aug 13 '18 at 13:15

how to "spread" a list-column?

2 Answers2