Divide group sum by total sum

Question

I am using the dplyr package. Let's suppose I have the below table.

Group	count
A	20
A	10
B	30
B	35
C	50
C	60

My goal is to create a summary table that contains the mean per each group, and also, the percentage of the mean of each group compared to the total means added together. So the final table will look like this:

Group	avg	prcnt_of_total
A	15	.14
B	32.5	.31
C	55	.53

For example, 0.14 is the result of the following calculation: 15/(15+32.5+55)

Right now, I was only able to produce the first column code that calculates the mean for each group:

summary_df<- df %>% 
             group_by(Group)%>% 
             summarise(avg=mean(count))

I still don't know how to produce the prcnt_of_total column. Any suggestions?

score 4 · Accepted Answer · answered Jul 14 '22 at 15:13

4

You can use the following code:

df <- read.table(text="Group    count
A   20
A   10
B   30
B   35
C   50
C   60", header = TRUE)

library(dplyr)
df %>%
  group_by(Group) %>%
  summarise(avg = mean(count)) %>%
  ungroup() %>%
  mutate(prcnt_of_total = prop.table(avg))
#> # A tibble: 3 × 3
#>   Group   avg prcnt_of_total
#>   <chr> <dbl>          <dbl>
#> 1 A      15            0.146
#> 2 B      32.5          0.317
#> 3 C      55            0.537

^{Created on 2022-07-14 by the reprex package (v2.0.1)}

answered Jul 14 '22 at 15:13

Quinten

35,235
5
20
53

2

That solves it without going out of the sequence. Thanks! – GitZine Jul 14 '22 at 15:14
2

Or `mutate(prcnt_of_total = avg/sum(avg))`, even if I like the use of `prop.table` – harre Jul 14 '22 at 15:15
2

Upvote but from the documentation: *"Note: `prop.table` is an earlier name, retained for back-compatibility."* The newer name for this function is `proportions`. – Rui Barradas Jul 14 '22 at 15:18

score 1 · Answer 2 · answered Jul 14 '22 at 15:30

We can drop the group in summarise itself.

library(dplyr)

df1 %>% 
  group_by(Group) %>% 
  summarise(avg = mean(count), .groups = "drop") %>% 
  mutate(prcnt_of_total = avg/sum(avg))
#> # A tibble: 3 x 3
#>   Group   avg prcnt_of_total
#>   <chr> <dbl>          <dbl>
#> 1 A      15            0.146
#> 2 B      32.5          0.317
#> 3 C      55            0.537

On another note, I am not sure if getting the average divided by the sum of averages is a meaningful metric unless we are sure to have the same number of entries per group. Given that, I suggested another solution as well.

## if you always have the same number of rows between the groups
df1 %>% 
  group_by(Group) %>% 
  summarise(avg = mean(count),
            prcnt_of_total = sum(count)/sum(.$count)) 
#> # A tibble: 3 x 3
#>   Group   avg prcnt_of_total
#>   <chr> <dbl>          <dbl>
#> 1 A      15            0.146
#> 2 B      32.5          0.317
#> 3 C      55            0.537

Data:

read.table(text =  "Group count
                    A     20
                    A     10
                    B     30
                    B     35
                    C     50
                    C     60", 
           header = T, stringsAsFactors = F) -> df1

langtang · Answer 3 · 2022-07-14T15:41:14.840

1

You can do this:

df %>% 
  group_by(Group) %>%
  summarize(avg = mean(count), prcent_of_total = sum(count)/sum(df$count))

Output:

  Group   avg prcent_of_total
  <chr> <dbl>           <dbl>
1 A      15             0.146
2 B      32.5           0.317
3 C      55             0.537

data.table is similar:

library(data.table)

setDT(df)[,.(avg = mean(count), prcent_of_total = sum(count)/sum(df$count)),Group]

edited Jul 14 '22 at 15:41

answered Jul 14 '22 at 15:35

langtang

22,248
1
12
27

Does dplyr automatically apply functions on the whole column if we put data.frame$column? doesn't take into consideration any current grouping? – GitZine Jul 14 '22 at 15:37
1

Note that my approach is the same as @M-- second solution,. and yes, by referrring to `df` the denominator calculation will be done over the entire frame (not by group) – langtang Jul 14 '22 at 15:40

Divide group sum by total sum

3 Answers3

Data: