Filtering subjects below accuracy threshold in R

Question

I have a data frame containing a list of subjects below a certain accuracy threshold (i.e 50% incorrect): 1. I have another data frame containing all subjects (accurate and inaccurate) with all their data. Importantly, there are multiple rows per subject in this central data frame: 2.

I need to remove the inaccurate subjects from the central data-frame in 2. How do I do this in R? I have already tried subset:

 filterdata<-subset(groupedmergedoutliers, subject==filtercorrectpercent$subject)

'groupedmergedoutliers' is the central subject data frame ; 'filtercorrectpercent'is the inaccurate subjects data frame;

What have you tried? Where are you stuck? Do you know how to subset data frames with `[` or with `subset()`? — Gregor Thomas, Nov 21 '16 at 21:32
I have tried subset, but it appears to be filtering more values than it should. I think the problem has to do with the fact that the central data frame contains multiple rows for each subject but the inaccurate subject data frame contains only one-row per subject. — Sumer Vaid, Nov 21 '16 at 23:16
Well, how are you trying it? We can probably make a slight correction... — Gregor Thomas, Nov 21 '16 at 23:19
Here is the code: filterdata<-subset(groupedmergedoutliers, subject==filtercorrectpercent$subject) groupedmergedoutliers=central subject data frame ; filtercorrectpercent=inaccurate subjects data frame — Sumer Vaid, Nov 21 '16 at 23:20
Please edit the code into your question, it is not easy to read in the comments. — Gregor Thomas, Nov 21 '16 at 23:27
The question has been edited to reflect the code. Thanks a ton. — Sumer Vaid, Nov 21 '16 at 23:41

score 0 · Accepted Answer · answered Nov 21 '16 at 23:41

You are using ==, which tests for pairwise equality (e.g., is the first row of df1$subject equal to the first row of df2$subject, are the second rows equal, etc.). Consider

c(1, 1, 2, 3) == c(1, 2, 3, 4)
# [1] TRUE FALSE FALSE FALSE

Instead, you want to be testing if each row of df1$subject is in any row of df2$subject. We can use %in% for this:

c(1, 1, 2, 3) %in% c(1, 2, 3, 4)
# [1] TRUE TRUE TRUE TRUE

filterdata <- subset(
    groupedmergedoutliers,
    subject %in% filtercorrectpercent$subject
)

Filtering subjects below accuracy threshold in R

1 Answers1