0

I'm facing an issue in R which I have described below. I need the sum and count (avoiding NA's) of all columns per group ID.

What I have:

  ID S1 S2
1 1  NA 1
2 1  5  2
3 1  1  3
4 2  2  7
5 3  4  NA
6 3  2  11

What I need

  ID S1.sum S2.sum S1.count S2.count
1 1  6      6      2        3   
2 2  2      7      1        1
3 3  6      11     2        1

Further edit: I have the exact same dataset but now there is an additional column. "T"

  ID S1 S2 T
1 1  NA 1  3
2 1  5  2  3
3 1  1  3  3
4 2  2  7  5
5 3  4  NA 2
6 3  2  11 2

Is it possible to get the count and sum like before but now only when S1/2 fulfills the condition (S1/2 <= T);

The resulting dataframe would look like:

  ID S1.sum S2.sum S1.count S2.count
1 1  1      6      1        3   
2 2  2      0      1        0
3 3  2      0      1        0
Syed Ahmed
  • 199
  • 1
  • 10

1 Answers1

1

You can use summarise() with across() in dplyr.

library(dplyr)

df %>%
  group_by(ID) %>%
  summarise(across(S1:S2, list(sum = ~ sum(.x, na.rm = T),
                               count = ~ sum(!is.na(.x)))))

# # A tibble: 3 x 5
#      ID S1_sum S1_count S2_sum S2_count
#   <int>  <int>    <int>  <int>    <int>
# 1     1      6        2      6        3
# 2     2      2        1      7        1
# 3     3      6        2     11        1

For your second question, just use .x[.x <= T] in the functions.

df %>%
  group_by(ID) %>%
  summarise(across(S1:S2, list(sum = ~ sum(.x[.x <= T], na.rm = T),
                               count = ~ sum(!is.na(.x[.x <= T])))))

# # A tibble: 3 x 5
#      ID S1_sum S1_count S2_sum S2_count
#   <int>  <int>    <int>  <int>    <int>
# 1     1      1        1      6        3
# 2     2      2        1      0        0
# 3     3      2        1      0        0

Data

df <- structure(list(ID = c(1L, 1L, 1L, 2L, 3L, 3L), S1 = c(NA, 5L, 
1L, 2L, 4L, 2L), S2 = c(1L, 2L, 3L, 7L, NA, 11L), T = c(3L, 3L, 
3L, 5L, 2L, 2L)), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6"))
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51