0

I have data as follows:

library(data.table)
dat <- structure(list(year2006 = c("1110", "1110", "1110", "1110", "1120", 
"1120", "1120", "1120"), group2006 = c("1", "2", "3", "4", "1", 
"2", "3", "4"), min2006 = c("1.35", "2", "3.7", 
"4.25", "5.6", "4.45", "3.09", "1.13"), 
    year2007 = c("1110", "1110", "1110", "1110", "1120", "1120", 
    "1120", "1120"), group2007 = c("1", "2", "3", "4", "1", 
    "2", "3", "4"), min2007 = c("5", "5.05", "5", 
    "1.59", "2.3", "3", "4.05", "5.16"
    )), row.names = c(NA, -8L), class = c("data.table", "data.frame"
))
dat

   year2006  group2006 min2006 year2007  group2007 min2007
1:     1110          1    1.35     1110          1       5
2:     1110          2       2     1110          2    5.05
3:     1110          3     3.7     1110          3       5
4:     1110          4    4.25     1110          4    1.59
5:     1120          1     5.6     1120          1     2.3
6:     1120          2    4.45     1120          2       3
7:     1120          3    3.09     1120          3    4.05
8:     1120          4    1.13     1120          4    5.16

What I would like to do, is to create a list of the numbers in min200x, per category in year200x.

Desired output:

   cat       year2006                         year2007
1:     1110  c("1.35", "2", "3.7", "4.25")    c("5", "5.05", "5", "1.59") 
2:     1120  c("5.6", "4.45", "3.09", "1.13") c("2.3", "3", "4.05", "5.16")

I thought I could do something like:

setDT(dat)[, cat := list(min2006), by=year2006]

But that does not work (it just puts the min2006 item in a new colum cat). And even if it did, it would only provide a solution for the year 2006. How should I go about this?

Tom
  • 2,173
  • 1
  • 17
  • 44

3 Answers3

4

I'm not sure why your columns in your test data are all character but the columns in your desired output are numeric. Also, you ask for a list of numbers by group but your expected output shows a vector.

Nevertheless, here's a tidyverse solution that creates list columns.

library(tidyverse)

x <- dat %>% 
  mutate(across(everything(), as.numeric)) %>% 
  group_by(year2006) %>% 
  select(year2006, starts_with("min")) %>% 
  summarise(across(everything(), lst))
x
# A tibble: 2 × 3
  year2006 min2006      min2007     
     <dbl> <named list> <named list>
1     1110 <dbl [4]>    <dbl [4]>   
2     1120 <dbl [4]>    <dbl [4]>

and, for example,

x$min2006
$min2006
[1] 1.35 2.00 3.70 4.25

$min2006
[1] 5.60 4.45 3.09 1.13

If your inputs are actually numeric, you can lose the mutate.

Edit

... and to get the correct name for the grouping column, you can add %>% rename(cat=year2006) to the pipe. Apologies for the omission.

Limey
  • 10,234
  • 2
  • 12
  • 32
  • This is great, thank you very much! I just wrote the desired output by hand and forgot about the fact that they were characters in the input. I'll make the desired output characters as well. – Tom May 31 '22 at 08:44
  • No problem. I just wanted to explain why (a) there were discrepancies between my solution and the expected output and (b) an apparently unnecessary step. Of course, wanting character lists is another reason to lose the `mutate`. – Limey May 31 '22 at 08:49
3

a similar approach

data.table


library(data.table)
COLS <- grep(names(df), pattern = "^min", value = TRUE)

setDT(df)[, lapply(.SD, list), .SDcol = COLS, by = year2006]
#>    year2006            min2006         min2007
#> 1:     1110    1.35,2,3.7,4.25   5,5.05,5,1.59
#> 2:     1120 5.6,4.45,3.09,1.13 2.3,3,4.05,5.16

Created on 2022-05-31 by the reprex package (v2.0.1)

Yuriy Saraykin
  • 8,390
  • 1
  • 7
  • 14
2

Here is also a base R solution,

l1 <- lapply(split.default(dat, gsub('\\D+', '', names(dat))), function(i) 
                                       aggregate(as.matrix(i[3]) ~ as.matrix(i[1]), i, list))

do.call(cbind, l1)[-3]

#  year2006          2006.min2006       2007.min2007
#1     1110    1.35, 2, 3.7, 4.25   5, 5.05, 5, 1.59
#2     1120 5.6, 4.45, 3.09, 1.13 2.3, 3, 4.05, 5.16
Sotos
  • 51,121
  • 6
  • 32
  • 66