2

I have a set of 6 vectors of different lengths (colnames: tp1-tp6). Looks something like this:

    tp1     tp2     tp3     tp4     tp5     tp6
    K06167  K14521  K17095  K21805  K03238  K18213
    K07376  K17095  K01424  K13116  K03283  K14521
    K03347  K14521  K14319  K00799  K08901  K01756
    K20179  K01693  K01682  K03283  K02716  K03238
    K03527  K02882  K01414  K01693  K08907  K01850
    K08901  K02912  K00940  K14319  K00411  K01768
    K11481  K02868  K04043  K14835  K01414  K15335
    K02716  K14835  K12606  K19371  K00963  K12818
    K03545  K14766  K09550  K04043  K01749  K02975
    K08907  K00602  K15437  K09550  K03116  K03002
    K15470  K10798  K03456  K03687  K09550  K17679
    K16465  K14823  K18059  K03456  K08738  K13116
    K03116  K00940  K03115  K18534  K08907  K14521
    K08738  K16474  K15502  K03495  K03687  K01937
    K08907  K19371  K00026  K13100  K08907  K03002
    .
    .
    .

I would like to create a list that contains all of the respective Kvalues that match between every possible combination of the 6 vectors. For instance, for the combination of tp2 and tp3, I want to find all of the values that the two vectors share in common, but don't appear in any of the other vectors (tp1, tp4, tp5, tp6). In this case it would be K00940. Is this possible with vectors of different lengths in R?

There was a similar question asked in

Finding all possible combinations of vector intersections?

and I have tried one of the codes given in the answers. While the code does give me all possible combinations and their respective values in a large list, it does not factor in that I only want unique intersections between the different vectors. For instance, the combination of tp2 and tp3 yielded me all possible values that the two vectors shared in common, but included values that were present in the other vectors that were also present in tp2 and tp3. I just want the unique values that only tp2 and tp3 have in common.

veclist <- list(tp1, tp2, tp3, tp4, tp5, tp6) 

combos <- Reduce(c,lapply(1:length(veclist), function(x) combn(1:length(veclist),x,simplify=FALSE)))

CKUP_combos <- lapply(combos, function(x) Reduce(intersect, veclist[x]) )

2 Answers2

1
sel = function(x)
{
  sh = names(veclist)%in%names(x)
  a = setdiff(Reduce(intersect,veclist[sh]),unlist(veclist[!sh]))
 if (length(a)>0) setNames(list(a),toString(names(x)))
}

res = Map(combn,list(veclist),1:6,c(sel),simplify=F)
unlist(unlist(res,FALSE),FALSE)
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • Sorry, this didn't work, and I just see a NULL in the console when I entered the code. The Res dataset just had a list of 6 smaller lists with various value of lengths. – Johnson Lin Sep 26 '19 at 14:48
0

Define the following function:

getUniqueIntersections <- function(veclist, col1name, col2name){
  #Returns vector of all strings in components col1name and col2name of veclist
  # that are not in any of the other components of veclist.

  inc1 <- veclist[[col1name]]
  inc2 <- veclist[[col2name]]
  inc <- intersect(inc1, inc2) 

  excNames <- setdiff(names(veclist), c(col1name, col2name))
  exc <- unique(do.call(c, veclist[excNames]))

  result <- setdiff(inc, exc)

  return(result)
}

Next, define veclist as a named list of the vectors of interest, and then use those names to create a dataframe of pairs that we want to iterate through:

veclist <- list(tp1=tp1, tp2=tp2, tp3=tp3, tp4=tp4, tp5=tp5, tp6=tp6)
dfCombNames <- as.data.frame(combn(names(veclist), 2))
dfCombNames
#   V1  V2  V3  V4  V5  V6  V7  V8  V9 V10 V11 V12 V13 V14 V15
#1 tp1 tp1 tp1 tp1 tp1 tp2 tp2 tp2 tp2 tp3 tp3 tp3 tp4 tp4 tp5
#2 tp2 tp3 tp4 tp5 tp6 tp3 tp4 tp5 tp6 tp4 tp5 tp6 tp5 tp6 tp6

Finally, create the result list by looping through each column in dfCombNames.

  • row1 and row2 of each column in dfCombNames are concatenated together to form list component key names, e.g. "tp2,tp3"
  • getUniqueIntersections is applied to values in row1 and row2, which correspond to the pairs of columns under consideration, to get the unique intersection values for that pair.
resultList <- list()
for(col in dfCombNames){
  col1 <- as.character(col[1])
  col2 <- as.character(col[2])
  compName <- paste(as.character(col), collapse=",")
  resultList[[compName]] <- getUniqueIntersections(veclist, col1, col2)
}

resultList should contain the desired values, e.g,

> resultList[["tp2,tp3"]]
[1] "K17095" "K00940"

> resultList[["tp1,tp5"]]
[1] "K08901" "K02716" "K08907" "K03116" "K08738"
  • Hi, everything worked up until the forloop; when I entered the code resultList[[compName]] <- getUniqueIntersections(df, col1, col2), the following error message appeared: Error in UseMethod("pull") : no applicable method for 'pull' applied to an object of class "function" – Johnson Lin Sep 26 '19 at 14:41
  • Have now modified the code to handle vectors of different lengths. Please check again with the new code. – Ben Wynne-Morris Sep 26 '19 at 18:14
  • Hi Ben, an "error in combn(names(veclist), 2) : n < m" occurred when I ran the code dfCombNames <- as.data.frame(combn(names(veclist), 2)) – Johnson Lin Sep 27 '19 at 13:08
  • Did you define veclist <- list(tp1, tp2, tp3, tp4, tp5, tp6) first? – Ben Wynne-Morris Sep 28 '19 at 10:57
  • Hi Ben, yup I defined veclist first and it still came up with the error. – Johnson Lin Sep 29 '19 at 17:22
  • Ok, it just occurred to me that I gave veclist names... Please try veclist <- list(tp1=tp1, tp2=tp2, tp3=tp3, tp4=tp4, tp5=tp5, tp6=tp6), then it will hopefully all work for you. – Ben Wynne-Morris Sep 30 '19 at 07:12
  • Hey, I tried the new code, and now the error is " Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 118, 99, 87, 513 " when I ran "resultList[[compName]] <- getUniqueIntersections(veclistUP, col1, col2)" Do you think this is probably a result of the fact that I have vectors in different lengths? How do I go about fixing that? Thanks! – Johnson Lin Sep 30 '19 at 14:41
  • The code should handle vectors of different lengths but please check you are using the latest version of getUniqueIntersections and not the original version of this function. – Ben Wynne-Morris Sep 30 '19 at 17:14
  • Hi Ben, I've used the latest version of getUniqueIntersections, and the error still pops up for some reason. – Johnson Lin Sep 30 '19 at 20:45
  • I've tested the code works for small toy examples including different length vectors so there must be some edge case causing an issue. Things to try: (1) Create a VERY simple veclist and get that working. When confident it works for simple cases, build things by adding tp1, tp2, .. to your veclist so you're able to pinpoint exactly when it fails (2) Try for a single pair before the for loop, e.g. getUniqueIntersections(veclist, "tp1", "tp5") (3) Inspect veclist to ensure all the list components really are vectors and not nested lists etc. Debugging, albeit frustrating, is where we learn most! – Ben Wynne-Morris Sep 30 '19 at 21:54