2

I've been trying for hours with this one. I have a dataset with two columns, let's call them V1 and V2.I also have a list of imporatant V1 values - Vx. I managed to acquire a subset of V1 with intersect function, so:

intersect <- intersect(df$V1,Vx)

Now I am desperately trying to get V2 values, corresponding to this subset. I've tried with

subset <- df[intersect(df$V1,Vx),]

But it returns me values, which are all NAs. So to explain with another case: I have a dataset

V1      V2
a54    hi
bc85   hk
sdx637 hi
vbd435 hk

And also a list, containing

l <- c("a54","sdx637")

What I am trying to get is:

 V1      V2
 a54    hi
 sdx637 hi

As I said, the code I've been using gives me all NAs, are there any alternatives? Thank you very much.

vagabond
  • 3,526
  • 5
  • 43
  • 76
sdgaw erzswer
  • 2,182
  • 2
  • 26
  • 45

1 Answers1

3

You can try

subset(df, V1 %in% l)
#      V1 V2
#1    a54 hi
#3 sdx637 hi

intersect can be used to get the common elements

 intersect(df$V1, l)
 #[1] "a54"    "sdx637"

but this will not give a logical index to subset the data,

 df[intersect(df$V1, l),]
 #     V1   V2
 #NA   <NA> <NA>
 #NA.1 <NA> <NA>

But %in% returns a logical index, which will be useful for subsetting.

As @Steven Beaupré mentioned in the comments, other options include [ or filter from dplyr

  df[df$V1 %in% l,]

Or

  library(dplyr)
  filter(df, V1 %in% l)

Or

  library(data.table)
  setDT(df)[V1 %chin% l] 
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you, this seems to work. There is one other thing though. What do you think can be the reason behind the fact that my l is let's say 300 element long, but the resulting subset is longer? Are there duplicates? – sdgaw erzswer May 03 '15 at 17:37
  • @sdgawerzswer There would be duplicates for that column. You can check `any(table(df$V1)>1)` to see if there are more than one element – akrun May 03 '15 at 17:40