dplyr summarise keep NA if all summarised values are NA

Question

I want to use dplyr summarise to sum counts by groups. Specifically I want to remove NA values if not all summed values are NA, but if all summed values are NA, I want to display NA. For example:

name <- c("jack", "jack", "mary", "mary", "ellen", "ellen")
number <- c(1,2,1,NA,NA,NA)

df <- data.frame(name,number)

In this case I want the following result:

Jack = 3
Mary = 1
Ellen = NA

However if I set na.rm = F:

df %>% group_by(name) %>% summarise(number = sum(number, na.rm = F))

The result is:

Jack = 3
Mary = NA
Ellen = NA

And if i set na.rm = T:

df %>% group_by(name) %>% summarise(number = sum(number, na.rm = T))

The result is

Jack = 3
Mary = 1
Ellen = 0

How can I solve this so that the cases with numbers and NA's get a number as output, but the cases with only NA's get NA as output.

score 6 · Answer 1 · answered Feb 21 '19 at 11:05

6

We can have a if/else condition - if all the values in 'number are NA, then return NA or else get the sum

library(dplyr)
df %>% 
  group_by(name) %>% 
  summarise(number = if(all(is.na(number))) NA_real_ else sum(number, na.rm = TRUE))

answered Feb 21 '19 at 11:05

akrun

874,273
37
540
662

score 6 · Answer 2 · answered Feb 21 '19 at 11:23

I was struggling with the same thing, so I wrote a solution into the package hablar. Try:

library(hablar)

df %>% group_by(name) %>% 
  summarise(number = sum_(number))

which gives you:

# A tibble: 3 x 2
  name  number
  <fct>  <dbl>
1 ellen    NA 
2 jack      3.
3 mary      1.

not that the only syntax difference is sum_ which is a function that returns NA if all is NA, else removes NA and calcuules sum no-missing values.

dplyr summarise keep NA if all summarised values are NA

2 Answers2

Linked

Related