Filter based on NA in dplyr

Question

This is my df

df <- structure(structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L), .Label = c("A", "B", "C", "D", "E"), class = "factor"), y = c(NA, NA, NA, NA, 1, NA, NA, NA, 1, 2, NA, NA, 1, 2, 3, NA, 2, 2, 3, 4, NA, 3, 3, 4, 5), x = c(1L, 2L, 3L, 4L,5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L)), .Names = c("group", "y", "x"), row.names = c(NA, 25L), class = "data.frame"))

> df
   group  y x
1      A NA 1
2      A NA 2
3      A NA 3
4      A NA 4
5      A  1 5
6      B NA 1
7      B NA 2
8      B NA 3
9      B  1 4
10     B  2 5
11     C NA 1
12     C NA 2
13     C  1 3
14     C  2 4
15     C  3 5
16     D NA 1
17     D  2 2
18     D  2 3
19     D  3 4
20     D  4 5
21     E NA 1
22     E  3 2
23     E  3 3
24     E  4 4
25     E  5 5

My goal is to calculate the mean per x value (across groups), using mutate. But first I'd like to filter the data, such that only those values of x remain for which there are at least 3 non-NA values. So in this example I only want to include those entries for which x is at least 3. I can't figure out how to create the filter(), any suggestions?

I think the info is not clear. You said at least 3 non-NA values, Is it in `y` column? Because the `x` doesn't have any NA values. If that is the case `df %>% group_by(group) %>% filter(sum(!is.na(y))>=3) %>% mutate(Mean=mean(x, na.rm=TRUE))` — akrun, Jan 16 '15 at 16:44
yep, sorry, that's what I meant. But your help brought me to the solution I was looking for: `df %>% group_by(x) %>% filter(sum(!is.na(y))>=3) %>% mutate(Mean=mean(x, na.rm=TRUE))` so thanks a lot! — erc, Jan 16 '15 at 16:47
@beetroot Why would you group by `x`? Akrun's seem more appropriate. Plus the column name is **group** — Rich Scriven, Jan 16 '15 at 16:49
@RichardScriven If I group by group then those groups which do not meet the filter condition are removed entirely, if I group by x only the respective rows are removed. — erc, Jan 16 '15 at 16:54
@beetroot Thanks for the clarification. But, it was not very clear from your post and by doing the `mutate`, wouldn't you get the same values as in the x column? — akrun, Jan 16 '15 at 17:09

akrun · Accepted Answer · 2015-01-16T17:08:13.810

10

You could try

df %>% 
   group_by(group) %>% #group_by(x) %>% #as per the OP's clarification
   filter(sum(!is.na(y))>=3) %>% 
   mutate(Mean=mean(x, na.rm=TRUE))

edited Jan 16 '15 at 17:08

answered Jan 16 '15 at 16:48

akrun

874,273
37
540
662

Filter based on NA in dplyr

1 Answers1

Linked