-5

I have a data frame containing a list of subjects below a certain accuracy threshold (i.e 50% incorrect): 1. I have another data frame containing all subjects (accurate and inaccurate) with all their data. Importantly, there are multiple rows per subject in this central data frame: 2.

I need to remove the inaccurate subjects from the central data-frame in 2. How do I do this in R? I have already tried subset:

 filterdata<-subset(groupedmergedoutliers, subject==filtercorrectpercent$subject) 

'groupedmergedoutliers' is the central subject data frame ; 'filtercorrectpercent'is the inaccurate subjects data frame;

Sumer Vaid
  • 15
  • 4
  • 1
    What have you tried? Where are you stuck? Do you know how to subset data frames with `[` or with `subset()`? – Gregor Thomas Nov 21 '16 at 21:32
  • I have tried subset, but it appears to be filtering more values than it should. I think the problem has to do with the fact that the central data frame contains multiple rows for each subject but the inaccurate subject data frame contains only one-row per subject. – Sumer Vaid Nov 21 '16 at 23:16
  • Well, how are you trying it? We can probably make a slight correction... – Gregor Thomas Nov 21 '16 at 23:19
  • Here is the code: filterdata<-subset(groupedmergedoutliers, subject==filtercorrectpercent$subject) groupedmergedoutliers=central subject data frame ; filtercorrectpercent=inaccurate subjects data frame – Sumer Vaid Nov 21 '16 at 23:20
  • 1
    Please edit the code into your question, it is not easy to read in the comments. – Gregor Thomas Nov 21 '16 at 23:27
  • The question has been edited to reflect the code. Thanks a ton. – Sumer Vaid Nov 21 '16 at 23:41

1 Answers1

0

You are using ==, which tests for pairwise equality (e.g., is the first row of df1$subject equal to the first row of df2$subject, are the second rows equal, etc.). Consider

c(1, 1, 2, 3) == c(1, 2, 3, 4)
# [1] TRUE FALSE FALSE FALSE

Instead, you want to be testing if each row of df1$subject is in any row of df2$subject. We can use %in% for this:

c(1, 1, 2, 3) %in% c(1, 2, 3, 4)
# [1] TRUE TRUE TRUE TRUE

filterdata <- subset(
    groupedmergedoutliers,
    subject %in% filtercorrectpercent$subject
) 
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294