Aggregate and count observations of multiple columns per group R

Question

I'm facing an issue in R which I have described below. I need the sum and count (avoiding NA's) of all columns per group ID.

What I have:

What I need

  ID S1.sum S2.sum S1.count S2.count
1 1  6      6      2        3   
2 2  2      7      1        1
3 3  6      11     2        1

Further edit: I have the exact same dataset but now there is an additional column. "T"

  ID S1 S2 T
1 1  NA 1  3
2 1  5  2  3
3 1  1  3  3
4 2  2  7  5
5 3  4  NA 2
6 3  2  11 2

Is it possible to get the count and sum like before but now only when S1/2 fulfills the condition (S1/2 <= T);

The resulting dataframe would look like:

  ID S1.sum S2.sum S1.count S2.count
1 1  1      6      1        3   
2 2  2      0      1        0
3 3  2      0      1        0

Darren Tsai · Accepted Answer · 2020-08-11T13:06:37.340

1

You can use summarise() with across() in dplyr.

library(dplyr)

df %>%
  group_by(ID) %>%
  summarise(across(S1:S2, list(sum = ~ sum(.x, na.rm = T),
                               count = ~ sum(!is.na(.x)))))

# # A tibble: 3 x 5
#      ID S1_sum S1_count S2_sum S2_count
#   <int>  <int>    <int>  <int>    <int>
# 1     1      6        2      6        3
# 2     2      2        1      7        1
# 3     3      6        2     11        1

For your second question, just use .x[.x <= T] in the functions.

df %>%
  group_by(ID) %>%
  summarise(across(S1:S2, list(sum = ~ sum(.x[.x <= T], na.rm = T),
                               count = ~ sum(!is.na(.x[.x <= T])))))

# # A tibble: 3 x 5
#      ID S1_sum S1_count S2_sum S2_count
#   <int>  <int>    <int>  <int>    <int>
# 1     1      1        1      6        3
# 2     2      2        1      0        0
# 3     3      2        1      0        0

Data

df <- structure(list(ID = c(1L, 1L, 1L, 2L, 3L, 3L), S1 = c(NA, 5L, 
1L, 2L, 4L, 2L), S2 = c(1L, 2L, 3L, 7L, NA, 11L), T = c(3L, 3L, 
3L, 5L, 2L, 2L)), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6"))

edited Aug 11 '20 at 13:06

answered Aug 04 '20 at 17:08

Darren Tsai

32,117
5
21
51

Works like a charm. – Syed Ahmed Aug 04 '20 at 17:15
1

@SyedAhmed And you can add `.names = "{col}.{fn}"` inside `across()` function to have columns named with dots rather than underscores like "S1.sum" etc. – Aug 04 '20 at 17:16
Is there any way to include another column which calculates the average too? Like, sum/count. – Syed Ahmed Aug 04 '20 at 17:18
1

@SyedAhmed of course! Just add `mean = ~ mean(.x, na.rm = T)` in the list. – Darren Tsai Aug 04 '20 at 17:19
Works perfectly. – Syed Ahmed Aug 04 '20 at 17:25
Hi @DarrenTsai, is there a way to do the exact same thing above only if satisfies a specific condition. For example count/sum only if the row value is greater than another column value. Check edit. – Syed Ahmed Aug 11 '20 at 12:47
1

@SyedAhmed Done! Please check. – Darren Tsai Aug 11 '20 at 13:07

Aggregate and count observations of multiple columns per group R

1 Answers1