I want to aggregate some columns of a data frame using a factor (group
in the example) but I want to use only the rows with the highest values in a different column (time
in the example)
df=data.frame(group=c(rep('a',5),rep('b',5)),
time=c(1:5,2:6),
V1=c(1,1,1,2,2,1,1,1,1,1),
V2=c(2,2,1,1,1,1,1,1,1,5))
I know how to do it using ddply
but it's pretty slow
ddply(df,'group',summarize,
V1=sum(V1[order(time,decreasing = T)[1:2]]),
V2=sum(V2[order(time,decreasing = T)[1:2]]))
"group" "V1" "V2"
"a" 4 2
"b" 2 6
Is there a faster way to do it (aggregate
or data.table
)?