2

I have a data table test:

id key
1 2365
1 2365
1 3709
2 6734
2 1908
2 4523

I want to aggregate unique key values by id into vector using data.table package.

Expected output:

id key_array
1 "2365", "3709"
2 "6734", "1908", "4523"

So, this should work like array_agg sql function.

I tried:
res <- test[, list(key_array = paste(unique(key), collapse = ", ")), by = "id"], but I get just a string. But I need to have opportunity to find the length of each vector and operate with its certain elements (find the intersection of two vectors for example).

Hilary
  • 475
  • 3
  • 10

1 Answers1

3

1. Base R

This an aggregate one-liner.

x <- 'id    key
1   2365
1   2365
1   3709
2   6734
2   1908
2   4523'
test <- read.table(textConnection(x), header = TRUE)

aggregate(key ~ id, test, \(x) c(unique(x)))
#>   id              key
#> 1  1       2365, 3709
#> 2  2 6734, 1908, 4523

Created on 2022-06-14 by the reprex package (v2.0.1)

But if user @Chris's comment is right then the right solution as follows.

aggregate(key ~ id, test, \(x) paste(unique(x), collapse = ", "))

Note that both c(unique(x)) and as.character(c(unique(x))) will output a list column, so the latter solution is right anyway.


2. Package data.table

Once again a one-liner.

The output is a list column, with each list member an integer vector. To keep as integers use

list(unique(key))

instead.

suppressPackageStartupMessages(library(data.table))

res <- setDT(test)[, .(key_array = list(as.character(unique(key)))), by = id]
res
#>    id      key_array
#> 1:  1      2365,3709
#> 2:  2 6734,1908,4523

str(res)
#> Classes 'data.table' and 'data.frame':   2 obs. of  2 variables:
#>  $ id       : int  1 2
#>  $ key_array:List of 2
#>   ..$ : chr  "2365" "3709"
#>   ..$ : chr  "6734" "1908" "4523"
#>  - attr(*, ".internal.selfref")=<externalptr>

Created on 2022-06-14 by the reprex package (v2.0.1)

Then, in order to access the vectors use two extractors, one to extract the column and the other one to extract the vectors.

res$key_array[[1]]
#> [1] "2365" "3709"
res$key_array[[2]]
#> [1] "6734" "1908" "4523"

Created on 2022-06-14 by the reprex package (v2.0.1)


3. dplyr solution

Group by id and collapse the unique strings into one only.

suppressPackageStartupMessages(library(dplyr))

test %>%
  group_by(id) %>%
  summarise(key_array = paste(unique(key), collapse = ", "))
#> # A tibble: 2 × 2
#>      id key_array  
#>   <int> <chr>           
#> 1     1 2365, 3709      
#> 2     2 6734, 1908, 4523

Created on 2022-06-14 by the reprex package (v2.0.1)

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • 1
    For whatever reason, it appears `as.character(c(unique(x))))` is desired output. Just speaking to `base::aggregate` here. – Chris Jun 14 '22 at 21:43
  • Thanks a lot, Rui! I'm interested in a `data.table` approach and your solution ```setDT(test)[, .(key_array = paste(unique(key), collapse = ", ")), by = id]``` is the same as mine, but unfortunately it gives a string instead of a vector – Hilary Jun 14 '22 at 21:49
  • @Hilary Changed, see if this is what you want. Also, should I remove the `dplyr` solution, since it is not asked for? – Rui Barradas Jun 14 '22 at 21:55
  • 1
    It is not necessary to remove the `dplyr` solution, maybe it will be useful to someone else – Hilary Jun 14 '22 at 22:06
  • @Hilary Got it, will keep all solutions. – Rui Barradas Jun 14 '22 at 22:07