1

I have a set of four vectors that look like this:

[1] PRI2CO       HEISCO       PRI2CO       DIALGU       DIALGU       ALSEBL      
Levels: ALSEBL       DIALGU       HEISCO       PRI2CO  

[1] PRI2CO       TET2PA       ALSEBL       PRI2CO       ALSEBL       TET2PA      
[7] HEISCO       TET2PA      
Levels: ALSEBL       HEISCO       PRI2CO       TET2PA

I would like to generate a vector that contains all values that match between every possible combination of the four vectors. For the two above, it would contain ALESBL, HEISCO, and PRI2CO. I've been doing every combination by hand so far but its tedious and I figure there has to be a better way. I tried writing a loop for it but I'm pretty new to R and it hasn't worked yet. Here's what I've been doing:

trees.species.P234<-intersect(intersect(trees.species.P2,trees.species.P3),trees.species.P4)
> trees.species.P234
[1] "PRI2CO      " "ALSEBL      "

I was thinking a for loop that involved a factorial might do it, but I can't get it to work.

brandonEm
  • 316
  • 1
  • 8
  • Is this : http://stackoverflow.com/questions/22624284/r-intersecting-strings/22624311 kind of thing helpful? It sounds like what you want to do, but I'm not exactly sure. i.e. - `Reduce(intersect, list(one,two) )` works for your example and is extendable to 3+ vectors. – thelatemail Jul 15 '14 at 01:12
  • That looks promising! I'll give it a shot tomorrow and report back – brandonEm Jul 15 '14 at 02:05
  • You may also check `intersect2` from `library(MergeGUI)` – akrun Jul 15 '14 at 07:03
  • `Reduce` actually worked really well for finding the intersect of the vectors - before I was just nesting intersects within each other (i.e. `intersect(intersect(a,b),c)`. What I want to do is apply that function to all of the possible combinations (maybe like `combn` below?) of the vectors - 1 and 2, 1 2 3, etc - maybe eventually get a count of how many intersections each unique ID shows up in? I think the comment below is getting to it but I'm not proficient enough to figure that one out – brandonEm Jul 16 '14 at 01:33

3 Answers3

1

Here you go, using the same vectors as proposed by gadzooks:

v1 <- c("PRI2CO","HEISCO","PRI2CO","DIALGU","DIALGU","ALSEBL")
v2 <- c("PRI2CO", "TET2PA","ALSEBL","PRI2CO","ALSEBL","TET2PA","HEISCO","TET2PA")
v3 <- c("PRI2CO","HEISCO","PRI2CO","DIALGU","DIALGU","ALSEBL")
v4 <- c("PRI2CO", "TET2PA","ALSEBL","PRI2CO","ALSEBL","TET2PA","HEISCO","TET2PA")

veclist <- list(v1,v2,v3,v4)
combos <- Reduce(c,lapply(2:length(veclist), 
            function(x) combn(1:length(veclist),x,simplify=FALSE) ))

lapply(combos, function(x) Reduce(intersect,veclist[x]) )

#[[1]]
#[1] "PRI2CO" "HEISCO" "ALSEBL"
# 
#[[2]]
#[1] "PRI2CO" "HEISCO" "DIALGU" "ALSEBL"
#
#[[3]]
#[1] "PRI2CO" "HEISCO" "ALSEBL"
#etc etc
thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • That worked! Thanks a lot. I assigned the lapply list to `intesects` and added `table(unlist(intersects)` to get what I was specifically looking for - a count based on the unique ID of all the combinations that the ID is present in. – brandonEm Jul 18 '14 at 21:16
0

First you have to list all the combinations. For that use combn function.

> combn(1:4,2)
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1    1    2    2    3
[2,]    2    3    4    3    4    4

Now we can use the apply function to find intersection between your vectors. But before that lets create a list of vectors. For easy reproducibility i created this list.

c <- combn(1:4,2)
l <- list(c("a","b"),c("b","c"),c("c","d"),c("d","e"))
Result <- apply(c,2,function(x){intersect(l[[x[1]]],l[[x[2]]])})

This result will be a list if you want it as vector you can use do.call

do.call("c",Result)
[1] "b" "c" "d"

For unique components

unique(do.call("c",Result))

This can be used for large lists as well.

Koundy
  • 5,265
  • 3
  • 24
  • 37
0
v1 <- c("PRI2CO","HEISCO","PRI2CO","DIALGU","DIALGU","ALSEBL")
v2 <- c("PRI2CO", "TET2PA","ALSEBL","PRI2CO","ALSEBL","TET2PA","HEISCO","TET2PA")
v3 <- c("PRI2CO","HEISCO","PRI2CO","DIALGU","DIALGU","ALSEBL")
v4 <- c("PRI2CO", "TET2PA","ALSEBL","PRI2CO","ALSEBL","TET2PA","HEISCO","TET2PA")

vall <- unique(c(v1,v2,v3,v4))
for(x in vall){
   if((x %in% v1)&(x %in% v2)&(x %in% v3)&(x %in% v4)){
   print(x)}
}