7

This is my df

df <- structure(structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L), .Label = c("A", "B", "C", "D", "E"), class = "factor"), y = c(NA, NA, NA, NA, 1, NA, NA, NA, 1, 2, NA, NA, 1, 2, 3, NA, 2, 2, 3, 4, NA, 3, 3, 4, 5), x = c(1L, 2L, 3L, 4L,5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L)), .Names = c("group", "y", "x"), row.names = c(NA, 25L), class = "data.frame"))

> df
   group  y x
1      A NA 1
2      A NA 2
3      A NA 3
4      A NA 4
5      A  1 5
6      B NA 1
7      B NA 2
8      B NA 3
9      B  1 4
10     B  2 5
11     C NA 1
12     C NA 2
13     C  1 3
14     C  2 4
15     C  3 5
16     D NA 1
17     D  2 2
18     D  2 3
19     D  3 4
20     D  4 5
21     E NA 1
22     E  3 2
23     E  3 3
24     E  4 4
25     E  5 5

My goal is to calculate the mean per x value (across groups), using mutate. But first I'd like to filter the data, such that only those values of x remain for which there are at least 3 non-NA values. So in this example I only want to include those entries for which x is at least 3. I can't figure out how to create the filter(), any suggestions?

erc
  • 10,113
  • 11
  • 57
  • 88
  • I think the info is not clear. You said at least 3 non-NA values, Is it in `y` column? Because the `x` doesn't have any NA values. If that is the case `df %>% group_by(group) %>% filter(sum(!is.na(y))>=3) %>% mutate(Mean=mean(x, na.rm=TRUE))` – akrun Jan 16 '15 at 16:44
  • yep, sorry, that's what I meant. But your help brought me to the solution I was looking for: `df %>% group_by(x) %>% filter(sum(!is.na(y))>=3) %>% mutate(Mean=mean(x, na.rm=TRUE))` so thanks a lot! – erc Jan 16 '15 at 16:47
  • @beetroot Why would you group by `x`? Akrun's seem more appropriate. Plus the column name is **group** – Rich Scriven Jan 16 '15 at 16:49
  • @RichardScriven If I group by group then those groups which do not meet the filter condition are removed entirely, if I group by x only the respective rows are removed. – erc Jan 16 '15 at 16:54
  • @beetroot Thanks for the clarification. But, it was not very clear from your post and by doing the `mutate`, wouldn't you get the same values as in the x column? – akrun Jan 16 '15 at 17:09

1 Answers1

10

You could try

df %>% 
   group_by(group) %>% #group_by(x) %>% #as per the OP's clarification
   filter(sum(!is.na(y))>=3) %>% 
   mutate(Mean=mean(x, na.rm=TRUE))
akrun
  • 874,273
  • 37
  • 540
  • 662