1

is there an analog to SQL analytic so that one could do aggregation without collapsing rows? For example, I want to do a sum for each group without GROUP BY, in SQL I can do this:

select group, x, sum(x) over(partition by group) group_sum
from mytable

I would like to be able to do something similar in R:

df <- data.frame(group=c('a', 'a', 'b', 'b'), x=c(1, 3, 10, 30))
df %>% mutate(group_sum = window_aggr(group_by=group, func=sum))

group x group_sum
a     1    4
a     3    4
b    10   10
b    30   40

where window_aggr is just a made-up function.

So is there a way to implement this in a single pipeline, without doing an actual aggregation and a join?

Thanks!

Best regards, Nikolai

Nikolai
  • 65
  • 5

1 Answers1

1

You can use ave which will in this case calculate the sum of df$x for the groups df$group.

df$group_sum <- ave(df$x, df$group, FUN=sum)

df
#  group  x group_sum
#1     a  1         4
#2     a  3         4
#3     b 10        40
#4     b 30        40

Or using base pipes:

df |> transform(group_sum = ave(x, group, FUN=sum))
#  group  x group_sum
#1     a  1         4
#2     a  3         4
#3     b 10        40
#4     b 30        40

Or using dplyr

library(dplyr)
df %>% mutate(group_sum = ave(x, group, FUN=sum))
#  group  x group_sum
#1     a  1         4
#2     a  3         4
#3     b 10        40
#4     b 30        40
GKi
  • 37,245
  • 2
  • 26
  • 48
  • Thank you! Is there a way to implement this in a dplyr pipeline? – Nikolai Jul 13 '21 at 13:59
  • 1
    I added a way using *base* pipes and also using *dplyr*. – GKi Jul 13 '21 at 14:08
  • 1
    In dplyr it would normally be done like this: `df %>% group_by(group) %>% mutate(group_sum = sum(x)) %>% ungroup` – G. Grothendieck Jul 13 '21 at 14:15
  • Yes sorry I usually don't use *dplyr*. On the other hand it was asked to *do a sum for each group without GROUP BY* and so I simple took the template from the question. – GKi Jul 13 '21 at 14:21
  • I think the question was referring to group by in sql rather than in dplyr. – G. Grothendieck Jul 13 '21 at 15:09
  • Actually I can't make the dplyr version to work. I'm getting an "x unique applies only to vectors" error. – Nikolai Jul 13 '21 at 15:11
  • @G.Grothendieck Yes that's also possible. It looks like that MonJeanJean had the same impression and deleted the answer with exact the same method. – GKi Jul 13 '21 at 15:12
  • Sorry, my mistake, typed "FUNC" instead of "FUN", that was causing the error. – Nikolai Jul 13 '21 at 15:17
  • @Nikolai I don't get this error. What versions do you have? I use R 4.1.0 and dplyr_1.0.7. – GKi Jul 13 '21 at 15:18
  • @GKi see my comment above -- I was using FUNC instead of FUN – Nikolai Jul 13 '21 at 15:51