Aggregate character string into vector in R

Question

I have a data table test:

id	key
1	2365
1	2365
1	3709
2	6734
2	1908
2	4523

I want to aggregate unique key values by id into vector using data.table package.

Expected output:

id	key_array
1	"2365", "3709"
2	"6734", "1908", "4523"

So, this should work like array_agg sql function.

I tried:
res <- test[, list(key_array = paste(unique(key), collapse = ", ")), by = "id"], but I get just a string. But I need to have opportunity to find the length of each vector and operate with its certain elements (find the intersection of two vectors for example).

Rui Barradas · Accepted Answer · 2022-06-14T22:05:41.210

1. Base R

This an aggregate one-liner.

x <- 'id    key
1   2365
1   2365
1   3709
2   6734
2   1908
2   4523'
test <- read.table(textConnection(x), header = TRUE)

aggregate(key ~ id, test, \(x) c(unique(x)))
#>   id              key
#> 1  1       2365, 3709
#> 2  2 6734, 1908, 4523

^{Created on 2022-06-14 by the reprex package (v2.0.1)}

But if user @Chris's comment is right then the right solution as follows.

aggregate(key ~ id, test, \(x) paste(unique(x), collapse = ", "))

Note that both c(unique(x)) and as.character(c(unique(x))) will output a list column, so the latter solution is right anyway.

2. Package `data.table`

Once again a one-liner.

The output is a list column, with each list member an integer vector. To keep as integers use

list(unique(key))

instead.

suppressPackageStartupMessages(library(data.table))

res <- setDT(test)[, .(key_array = list(as.character(unique(key)))), by = id]
res
#>    id      key_array
#> 1:  1      2365,3709
#> 2:  2 6734,1908,4523

str(res)
#> Classes 'data.table' and 'data.frame':   2 obs. of  2 variables:
#>  $ id       : int  1 2
#>  $ key_array:List of 2
#>   ..$ : chr  "2365" "3709"
#>   ..$ : chr  "6734" "1908" "4523"
#>  - attr(*, ".internal.selfref")=<externalptr>

^{Created on 2022-06-14 by the reprex package (v2.0.1)}

Then, in order to access the vectors use two extractors, one to extract the column and the other one to extract the vectors.

res$key_array[[1]]
#> [1] "2365" "3709"
res$key_array[[2]]
#> [1] "6734" "1908" "4523"

^{Created on 2022-06-14 by the reprex package (v2.0.1)}

3. `dplyr` solution

Group by id and collapse the unique strings into one only.

suppressPackageStartupMessages(library(dplyr))

test %>%
  group_by(id) %>%
  summarise(key_array = paste(unique(key), collapse = ", "))
#> # A tibble: 2 × 2
#>      id key_array  
#>   <int> <chr>           
#> 1     1 2365, 3709      
#> 2     2 6734, 1908, 4523

^{Created on 2022-06-14 by the reprex package (v2.0.1)}

For whatever reason, it appears `as.character(c(unique(x))))` is desired output. Just speaking to `base::aggregate` here. — Chris, Jun 14 '22 at 21:43
Thanks a lot, Rui! I'm interested in a `data.table` approach and your solution ```setDT(test)[, .(key_array = paste(unique(key), collapse = ", ")), by = id]``` is the same as mine, but unfortunately it gives a string instead of a vector — Hilary, Jun 14 '22 at 21:49
@Hilary Changed, see if this is what you want. Also, should I remove the `dplyr` solution, since it is not asked for? — Rui Barradas, Jun 14 '22 at 21:55
It is not necessary to remove the `dplyr` solution, maybe it will be useful to someone else — Hilary, Jun 14 '22 at 22:06

Aggregate character string into vector in R

1 Answers1

1. Base R

2. Package data.table

3. dplyr solution

2. Package `data.table`

3. `dplyr` solution