3

I have a list-column and I would like to use c() for each group to combine these lists in summarize. This should result in one row per group, but it does not (note the code was written using dplyr >= 1.1.0):

library(dplyr)

df <- tibble::tibble(group = c("A", "A", "B"),
                     list_col = list(list("One"), list("Two"), list("Three")))

df |> 
  summarize(list_col = c(list_col),
            .by = group)

This returns:

  group list_col  
  <chr> <list>    
1 A     <list [1]>
2 A     <list [1]>
3 B     <list [1]>
Warning message:
Returning more (or less) than 1 row per `summarise()` group was deprecated in dplyr 1.1.0.
i Please use `reframe()` instead.
i When switching from `summarise()` to `reframe()`, remember that `reframe()` always
  returns an ungrouped data frame and adjust accordingly.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated. 

Expected Output

output <- tibble::tibble(group = c("A", "B"),
               list_col = list(list("One", "Two"), list("Three")))

  group list_col  
  <chr> <list>    
1 A     <list [2]>
2 B     <list [1]>

output$list_col[[1]]
[[1]]
[1] "One"

[[2]]
[1] "Two"

Alternate Solution

You could do something like the following code. However A) it changes the row-wise type of the column and B) I would like to specifically know why c() does not work:

df |>
  summarize(list_col = list(unlist(list_col)),
            .by = group)

  group list_col 
  <chr> <list>   
1 A     <chr [2]>
2 B     <chr [1]>

Within the first group (A) I expected something like the following to happen to combine the two lists into one list:

c(list("One"), list("Two"))
[[1]]
[1] "One"

[[2]]
[1] "Two"

So, why does this not work? Is this a bug or is there something with the syntax I am missing?

LMc
  • 12,577
  • 3
  • 31
  • 43
  • `Within the first group (A) I expected something like the following to happen to combine the two lists into one list:` It is already explained in the solution about the behavior of `c` applying on a variadic argument and using `do.call` – akrun Mar 30 '23 at 16:20

1 Answers1

2
library(dplyr)
out <- df %>% 
  reframe(list_col = list(as.list(unlist(list_col))), .by = group)

-output

> out
# A tibble: 2 × 2
  group list_col  
  <chr> <list>    
1 A     <list [2]>
2 B     <list [1]>
> out$list_col[[1]]
[[1]]
[1] "One"

[[2]]
[1] "Two"

-OP's expected

> output$list_col[[1]]
[[1]]
[1] "One"

[[2]]
[1] "Two"

Regarding the difference in c and unlist, the default arguments are FALSE/TRUE for recursive

c(..., recursive = FALSE, use.names = TRUE)

unlist(x, recursive = TRUE, use.names = TRUE)

ie. basic difference is

> c(list("a"))
[[1]]
[1] "a"

> unlist(list("a"))
[1] "a"

With more than two elements, the of a single list, the ... variadic argument length is just 1 as it is a single list being passed into the c.

> c(list("a", "b"))
[[1]]
[1] "a"

[[2]]
[1] "b"

the c is not doing anything, unless we use it with do.call where each element of the list is being passed as separate arguments

> do.call(c, list("a", "b"))
[1] "a" "b"

With the OP's example

> df$list_col[1:2]
[[1]]
[[1]][[1]]
[1] "One"


[[2]]
[[2]][[1]]
[1] "Two"
> c(df$list_col[1:2])
[[1]]
[[1]][[1]]
[1] "One"


[[2]]
[[2]][[1]]
[1] "Two"

> do.call(c, df$list_col[1:2])
[[1]]
[1] "One"

[[2]]
[1] "Two"

i.e. if we do

out2 <- df %>% 
  reframe(list_col = list(do.call(c, list_col)), .by = group)

-output

> out2$list_col[[1]]
[[1]]
[1] "One"

[[2]]
[1] "Two"
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks, my gap in understanding was that the function is passed `list(list("One"), list("Two"))` hence `c(list(list("One"), list("Two")))` does not do anything. I thought it was passing `c(list("One"), list("Two"))`. – LMc Mar 30 '23 at 19:03
  • 1
    @LMc In the function definition, it is a variadic argument in `c(...)`, thus it can work only when you pass the arguments manually or do some kind of string evauation (eval(parse(...) or use `do.call` – akrun Mar 30 '23 at 19:08
  • 1
    Right, I understand. My confusion was more about the structure of grouped data that summarize passes to a user-specified function like `c()`. – LMc Mar 30 '23 at 19:17