How to sum a variable by group and make sure if every variable in a group is a NA the sum is NA instead of zero?

Question

I have data as the following example:

| a | b |
|---|---|
| 1 | 1 |
| 1 | 0 |
| 2 | 1 |
| 2 | NA|
| 3 | 0 |
| 4 | NA|
| 4 | NA|
| 4 | NA|
| 5 | 1 |
| 5 | NA|
| 5 | 0 |
| 5 | 1 |
| 6 | 0 |

I need to create a new data frame by summing b dependent on a and if every data in a group is NA the output should be NA instead of zero, like this:

| a | b |
|---|---|
| 1 | 1 |
| 2 | 1 |
| 3 | 0 |
| 4 | NA|
| 5 | 2 |
| 6 | 0 |

How can I structure a sum in R to behave like this?

Thank you

can you please add a dput of your initial data? – Bruno Aug 15 '21 at 22:41 — Bruno, Aug 15 '21 at 22:41

score 2 · Answer 1 · answered Aug 15 '21 at 22:45

A base R option using aggregate

aggregate(. ~ a,
  df, 
  function(x) ifelse(all(is.na(x)), NA, sum(x, na.rm = TRUE)),
  na.action = na.pass
)

gives

data

> dput(df)
structure(list(a = c(1L, 1L, 2L, 2L, 3L, 4L, 4L, 4L, 5L, 5L, 
5L, 5L, 6L), b = c(1L, 0L, 1L, NA, 0L, NA, NA, NA, 1L, NA, 0L,
1L, 0L)), class = "data.frame", row.names = c(NA, -13L))
´``

score 1 · Answer 2 · answered Aug 15 '21 at 23:07

Using mean_ from hablar

library(dplyr)
library(hablar)
df %>% 
    group_by(a) %>%
    summarise(b = sum_(b))

-output

# A tibble: 6 x 2
      a     b
  <int> <int>
1     1     1
2     2     1
3     3     0
4     4    NA
5     5     2
6     6     0

data

df <- structure(list(a = c(1L, 1L, 2L, 2L, 3L, 4L, 4L, 4L, 5L, 5L, 
5L, 5L, 6L), b = c(1L, 0L, 1L, NA, 0L, NA, NA, NA, 1L, NA, 0L,
1L, 0L)), class = "data.frame", row.names = c(NA, -13L))

How to sum a variable by group and make sure if every variable in a group is a NA the sum is NA instead of zero?

2 Answers2

data

data