1
library(data.table)
data = data.table("STUDENT" = c(1:100),
                  "SAMPLEWEIGHT" = sample(12:99, r = T, 100),
"LABEL1" = sample(1:2, r = T, 100),
"LABEL3" = sample(1:3, r = T, 100),
"CAT"=sample(0:1,r = T, 100),
"FOX"=sample(0:1,r = T, 100),
"DOG"=sample(0:1,r = T, 100),
"MOUSE"=sample(0:1,r = T, 100),
"BIRD"=sample(0:1,r = T, 100))

dataWANT = data.frame("LABEL1" = c(1,1,1,2,2,2),
                                            "LABEL3" = c(1,2,3,1,2,3),
                                            "CAT_N" = NA,
                                            "CAT_PER" = NA,
                                            "FOX_N" = NA,
                                            "FOX_PER" = NA,
                                            "DOG_N" = NA,
                                            "DOG_PER" = NA,
                                            "MOUSE_N" = NA,
                                            "MOUSE_PER" = NA,
                                            "BIRD_N" = NA,
                                            "BIRD_PER" = NA)

I have a data.table call it data, and am attempting to try and summarize the student data like what is shown in dataWANT.

in dataWANT the columns that have _N at the end is just the count of values in the column that equals to 1 for each LABEL1 and LABEL3 combination so a total of 6 groups.

in dataWANT the columns that have _PER at the end is the weighted proportion of the groups that have ones in their column.

bvowe
  • 3,004
  • 3
  • 16
  • 33
  • 1
    Do you need `data[, c(setNames(lapply(.SD, sum), paste0(names(.SD), "_N")), setNames(lapply(.SD, function(x) weighted.mean(x == 1, SAMPLEWEIGHT)), paste0(names(.SD), "_PER"))),.(LABEL1, LABEL3), .SDcols = CAT:BIRD]` – akrun May 05 '20 at 18:59

1 Answers1

1

An option using data.table would be to group by 'LABEL1', 'LABEL3', specify the columns of interest in .SDcols, get the sum (as it binary columns) by looping over the .SD and concatenate with the weighted.mean based on the 'SAMPLEWEIGHT' column

library(data.table)
data[, c(setNames(lapply(.SD, sum), paste0(names(.SD), "_N")), 
  setNames(lapply(.SD, function(x) weighted.mean(x == 1, SAMPLEWEIGHT)), 
     paste0(names(.SD), "_PER"))),.(LABEL1, LABEL3), .SDcols = CAT:BIRD]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • @akun also if you have a chance to see https://stackoverflow.com/questions/61611000/r-summarize-collapsed-data-table/61611259?noredirect=1#comment108991548_61611259 – bvowe May 05 '20 at 19:40
  • 1
    @bvowe Please do check the values for weightedd.mean – akrun May 05 '20 at 19:40
  • 1
    This would be the `dplyr` equivalent (which is out of scope of the question, but still interesting to compare to the `data.table` approach): `data %>% group_by(LABEL1, LABEL3) %>% summarise_at(vars(CAT:BIRD), list(N = ~ sum(.), PER = ~ weighted.mean(. == 1, SAMPLEWEIGHT)))` – TimTeaFan May 05 '20 at 19:50