Use group_by to filter specific cases while keeping NAs

Question

I want to filter my dataset to keep cases with observations in a specific column. To illustrate:

help <- data.frame(deid = c(5, 5, 5, 5, 5, 12, 12, 12, 12, 17, 17, 17),
               score.a = c(NA, 1, 1, 1, NA, NA, NA, NA, NA, NA, 1, NA))

Creates

   deid score.a
1     5      NA
2     5       1
3     5       1
4     5       1
5     5      NA
6    12      NA
7    12      NA
8    12      NA
9    12      NA
10   17      NA
11   17       1
12   17      NA

And I want to tell dplyr to keep cases that have any observations in score.a, including the NA values. Thus, I want it to return:

  deid score.a
1     5      NA
2     5       1
3     5       1
4     5       1
5     5      NA
6    17      NA
7    17       1
8    17      NA

I ran the code help %>% group_by(deid) %>% filter(score.a > 0) however it pulls out the NAs as well. Thank you for any assistance.

Edit: A similar question was asked here How to remove groups of observation with dplyr::filter() However, in the answer they use the 'all' condition and this requires use of the 'any' condition.

possible duplicate of [How to remove groups of observation with dplyr::filter()](http://stackoverflow.com/questions/24583152/how-to-remove-groups-of-observation-with-dplyrfilter) — shadowtalker, Jun 13 '15 at 16:08
Suppose if `help$score.a[3] <- -1`, what would be the result? — akrun, Jun 13 '15 at 16:08
Not sure I'm following your question, @akrun ... sorry. When I run that the 3rd row of score.a is -1 — b222, Jun 13 '15 at 16:17
My question is that suppose one of the element in `score.a` for a particular group 'deid' is less than 0. Then would you select that group? — akrun, Jun 13 '15 at 16:19
Yes. Correct. Essentially any observation in score.a and I want to keep that group. Although with the present data all observations are >= 0 — b222, Jun 13 '15 at 16:20
So, my first solution should work. I added the second solution as a bonus. — akrun, Jun 13 '15 at 16:23
Yes! That worked... I ran the following code... `help %>% group_by(deid) %>% filter(score.a = any(!is.na(score.a)))` — b222, Jun 13 '15 at 16:23
But, that code is not checking the condition `score.a >0` Check my update — akrun, Jun 13 '15 at 16:27

akrun · Accepted Answer · 2015-06-13T16:25:10.660

Try

library(dplyr)
help %>%
      group_by(deid) %>%
      filter(any(score.a >0 & !is.na(score.a)))
#    deid score.a
#1    5      NA
#2    5       1
#3    5       1
#4    5       1
#5    5      NA
#6   17      NA
#7   17       1
#8   17      NA

Or a similar approach with data.table

library(data.table)
setDT(help)[, if(any(score.a>0 & !is.na(score.a))) .SD , deid]
#    deid score.a
#1:    5      NA
#2:    5       1
#3:    5       1
#4:    5       1
#5:    5      NA
#6:   17      NA
#7:   17       1
#8:   17      NA

If the condition is to subset 'deid's with all the values in 'score.a' > 0, then the above code can be modified to,

setDT(help)[,  if(!all(is.na(score.a)) & 
         all(score.a[!is.na(score.a)]>0)) .SD , deid]
#   deid score.a
#1:    5      NA
#2:    5       1
#3:    5       1
#4:    5       1
#5:    5      NA
#6:   17      NA
#7:   17       1
#8:   17      NA

Suppose one of the 'score.a' in 'deid' group is less than 0,

help$score.a[3] <- -1

the above code would return

 setDT(help)[,  if(!all(is.na(score.a)) & 
           all(score.a[!is.na(score.a)]>0, deid],
 #   deid score.a
 #1:   17      NA
 #2:   17       1
 #3:   17      NA

score 2 · Answer 2 · answered Jun 13 '15 at 16:25

2

library(dplyr)
df%>%group_by(deid)%>%filter(sum(score.a,na.rm=T)>0)

answered Jun 13 '15 at 16:25

Shenglin Chen

4,504
11
11

This would fail if `help$score.a[2] <- -0.2; help$score.a[3] <- 0.2; help$score.a[3] <- 0` – akrun Jun 13 '15 at 16:34
It is interesting, I changed to df%>%group_by(deid)%>%filter(sum(!is.na(score.a))>0), it work, but I don't know why. – Shenglin Chen Jun 13 '15 at 16:54
But that is again not correct because `help$score.a[2:4] <- -0.2` in this case, all values are negative for deid `5`, but it sill returns that group. – akrun Jun 13 '15 at 17:35
df%>%group_by(deid)%>%filter(sum(!is.na(score.a))!=0) – Shenglin Chen Jun 13 '15 at 17:58
I think it is not getting correct. `df$score.a[11] <- -0.5` It will still get the 17 group. – akrun Jun 13 '15 at 18:06

Use group_by to filter specific cases while keeping NAs

2 Answers2

Linked