I am looking for assistance to retain incomplete data when filtering based on the number of NA. I am conducting a 25-week quasi-experiment where an intervention occurs at Week 13. For my primary analysis I am only including participants with at least 3 weeks of measurements in the pre- and post-intervention period. I was able to retain my analytic sample using code from this link: Filter based on NA in dplyr
However, I can't obtain the correct number of participants with incomplete data to retain the sample size in the original dataset when combined with those with complete data. As an example, when I apply the filter I obtain 2/3 of participants but when I use the reverse function (i.e., removing ! from is.na) I do not get the other 1/3. Here is the code I used to obtain my analytic sample, followed by the code I am trying to use to obtain participants with incomplete data:
BCData6 <- BCData5 %>%
group_by(user_id)%>%
filter(sum(!is.na(Average.Steps)[Intervention==0])>=3)%>%
filter(sum(!is.na(Average.Steps)[Intervention==1])>=3)
NLData7 <- NLData5 %>%
group_by(user_id)%>%
filter(sum(is.na(Average.Steps)[Intervention==0])>=3)%>%
filter(sum(is.na(Average.Steps)[Intervention==1])>=3)
When applying this code, it results in 348,075 observations from the original sample size of 548,200. However, when removing ! it yields a dataset with 182,450 observations which sums to 530,525: 17,675 short of the original sample size.
Any guidance would be greatly appreciated!
EDIT
> dput(NLData6[1:25,c(9,10)])
structure(list(Week = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25), Average.Steps = c(2124,
3115, 2325, 2586, 4273, 3981, 5716, 4724, 3948, 1531, 1539, 4166,
2016, 2453, 1700, 1903, 1546, 2139, 1765, 1608, 2416, 2254, 2136,
1827, 1906)), row.names = c(NA, -25L), class = c("tbl_df", "tbl",
"data.frame"))
Please forgive my naivety; I'm still figuring out R Studio itself along with the customs of Stack Overflow and Cross Validated.