I have the following data frame, which is a subset of a much larger one containing over 3 million rows.
df <- data.frame(Group = c(1,1,1,2,2,3,3,3,2,2,4,4,1,4,1,3,1,3,2,4,2,1,3,2,4),
SubGroup = c("A","A","C","B","C","A","B","C","C","A","B","C","C","A","B","C","A","A","B","A","C","C","B","B","C"),
Type = c("Z","Y","Z","X","X","Y","Z","X","Z","Y","Y","Y","X","Y","Z","X","X","Z","Y","X","Z","Z","Y","X","Y"),
Value = c(4,7,2,8,4,6,1,8,3,6,3,6,3,5,6,7,3,2,7,4,9,4,7,8,1),
Dup = c(1,1,0,0,0,1,0,1,1,0,1,0,0,1,0,1,0,1,0,1,1,0,1,1,1))
Per Group, there are five different formulas that I want to calculate. 1) A sum of Value by SubGroup 2) A sum of Value by SubGroup and Dup 3) A sum of Value by SubGroup and Type Z 4) A sum of Value by SubGroup and Type Y 5) A sum of Value by SubGroup and Type X
Here is the code that I currently have using dplyr:
result <- df %>%
group_by(Group) %>%
summarise(SubGroup.A.Total = sum(Value[SubGroup=="A"]),
SubGroup.A.Dup = sum(Value[SubGroup=="A" & Dup==1]),
SubGroup.A.TypeZ = sum(Value[SubGroup=="A" & Type=="Z"]),
SubGroup.A.TypeY = sum(Value[SubGroup=="A" & Type=="Y"]),
SubGroup.A.TypeX = sum(Value[SubGroup=="A" & Type=="X"]),
SubGroup.B.Total = sum(Value[SubGroup=="B"]),
SubGroup.B.Dup = sum(Value[SubGroup=="B" & Dup==1]),
SubGroup.B.TypeZ = sum(Value[SubGroup=="B" & Type=="Z"]),
SubGroup.B.TypeY = sum(Value[SubGroup=="B" & Type=="Y"]),
SubGroup.B.TypeX = sum(Value[SubGroup=="B" & Type=="X"]),
SubGroup.C.Total = sum(Value[SubGroup=="C"]),
SubGroup.C.Dup = sum(Value[SubGroup=="C" & Dup==1]),
SubGroup.C.TypeZ = sum(Value[SubGroup=="C" & Type=="Z"]),
SubGroup.C.TypeY = sum(Value[SubGroup=="C" & Type=="Y"]),
SubGroup.C.TypeX = sum(Value[SubGroup=="C" & Type=="X"]))
However, I really want to know if there is a more efficient way of doing this, both from a run time and number of lines perspective? Since I am in essence looping the same five formulas by SubGroup, I imagine there is a much simpler way of writing this.