2

not sure if I chose a good title at all... Also I don't know if I use the right terminology, so maybe with the right search terms I would find a solution to this problem...

I have a list of strings from which I want to have all sets of "exclusive" combinations of 3.

Example: With the following

require(utils)
mylist<-c("strA","strB","strC","strD","strE","strF")
t(combn(mylist,3))

I get a table that lists all possible combinations of 3 out of those 6 strings (so each row represents one combination of 3):

        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strA" "strB" "strD"
   [3,] "strA" "strB" "strE"
   [4,] "strA" "strB" "strF"
   [5,] "strA" "strC" "strD"
   [6,] "strA" "strC" "strE"
   [7,] "strA" "strC" "strF"
   [8,] "strA" "strD" "strE"
   [9,] "strA" "strD" "strF"
  [10,] "strA" "strE" "strF"
  [11,] "strB" "strC" "strD"
  [12,] "strB" "strC" "strE"
  [13,] "strB" "strC" "strF"
  [14,] "strB" "strD" "strE"
  [15,] "strB" "strD" "strF"
  [16,] "strB" "strE" "strF"
  [17,] "strC" "strD" "strE"
  [18,] "strC" "strD" "strF"
  [19,] "strC" "strE" "strF"
  [20,] "strD" "strE" "strF"

but I would like to have all sets of combinations of 3, in which each string only appears once. So my desired output would look sth like this:

$1
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strD" "srtE" "strF"
$2
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strD"
   [1,] "strC" "strE" "strF"
$3
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strE"
   [1,] "strC" "strD" "strF"
...

So here each subelement ($1, $2, $3, etc.) contains 2 combinations of 3 strings (as 2*3=6; with 6 strings). In each set each string must not appear more than once.

Of course it would be great if this would be also possible for lengths of mylist that are not a multiple of n=3. In case we have let's say 10 strings (with the addition of "strG", "strH", "strI", and "strJ"), I would like to have one string ommitted in each combination. So the desired result would be sth like

$1
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strD" "strE" "strF"
   [3,] "strG" "strH" "strI"
$2
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strD" "strE" "strF"
   [3,] "strG" "strH" "strJ"
$3
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strD" "strE" "strF"
   [3,] "strG" "strI" "strJ"
$4
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strE" "strF" "strG"
   [3,] "strH" "strI" "strJ"
...

Does someone have a solution to this? If my explanation was unclear, please let me know.

Cheers

Ingo
  • 53
  • 6
  • You don't want combinations, you want permutations. It's also unclear what is desired when lengths (of what ever result is expected) are not equal to 'n'. Do you want NA's in hte results and if so will the postion of hte NA create a new unique instance? – IRTFM Dec 03 '15 at 17:32
  • 42, thanks for this hint. But as far as I understood in permutations the order is important, which is not what I want. However I added some words in the original question, does it get any clearer? – Ingo Dec 03 '15 at 18:51
  • Then how is it not just a restructuring of `combn(, ..., 6)`, perhaps `lapply( apply( combn( mylist, 6), 2, list), matrix, ncol=3)` – IRTFM Dec 03 '15 at 19:07
  • Maybe just because I don't know how... But I would say it is not that simple because the desired result is somewhat larger than `combn`'s output. How would you select the rows for the individual sets? EDIT: your suggestion gives `[[1]] [,1] [,2] [,3] [1,] Character,6 Character,6 Character,6` – Ingo Dec 03 '15 at 19:12
  • The items in your desired output _are_ just permutations arranged as 2x3 matrices. There would be only one unique combination of 6 elements from a list of 6. – IRTFM Dec 03 '15 at 19:21
  • but I want combinations of 3! Each row represents one combination. – Ingo Dec 03 '15 at 19:21

2 Answers2

1

Will assume the transposed combo matrix is named mat. Check to see if there is any overlap with length applied to results of intersect function:

 res <- list();
 for (i in 1:nrow(mat) ){
    for( j in 1:nrow(mat)){  
          if( !length(intersect(mat[i,] , mat[j,])) ) 
               res[[paste(i,j,sep="_")]] <- rbind( mat[i,], mat[j, ]) } }


> res
$`1_20`
     [,1]   [,2]   [,3]  
[1,] "strA" "strB" "strC"
[2,] "strD" "strE" "strF"

$`2_19`
     [,1]   [,2]   [,3]  
[1,] "strA" "strB" "strD"
[2,] "strC" "strE" "strF"

$`3_18`
     [,1]   [,2]   [,3]  
[1,] "strA" "strB" "strE"
[2,] "strC" "strD" "strF"

.... snipped

Depending on your definition of "unique" you might decide to only take the first ten items since half of these are transpositions of rows:

> res[[1]]
     [,1]   [,2]   [,3]  
[1,] "strA" "strB" "strC"
[2,] "strD" "strE" "strF"
> res[[20]]
     [,1]   [,2]   [,3]  
[1,] "strD" "strE" "strF"
[2,] "strA" "strB" "strC"
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • thanks again, 42, I really appreciate! I fear we have a misunderstanding here. I want combinations of 3. Each of the first level elements of the above shown "desired result" contains 2 combinations of 3, in which each string appears only once (as 2*3=6). I would like to obtain all possible sets of these combinations. – Ingo Dec 03 '15 at 19:39
  • All right. I will replace my answer with one that delivers the desired result, which I now understand. (I think there are only 10 such pairs, since for each one in the transposed matrix, there is only one that satisfies the requirements.) – IRTFM Dec 03 '15 at 19:42
  • Thank you, 42, this indeed solves the problem for the example I gave. By trying to be general I created a too simple problem... In fact I have 49 strings, so in each set I would need 16 nested for loops (16*3=48), and `intersect` would probably not work, sorry for being not specific enough. But your solution helped me already quite a lot. Maybe I find a way from this point. thanks again! – Ingo Dec 03 '15 at 20:44
1

Based on 42's help (thanks again!) I figured out a way which is anything but elegant, but does the job (slowly...). But only because I could eliminate some of the possible combinations before doing the following steps it was feasible this way. In my original problem I have 49 strings, which would result in extremely big vectors, so be careful when applying the following steps to more than, say, 15 strings. There is surely a way to calculate how many combinations have to be processed...

Here comes the complete example

require(utils)
mylist<-paste("str",LETTERS[1:10],sep="")
mat<-as.data.frame(t(combn(mylist, 3, simplify = TRUE)))
mat[] <- lapply(mat, as.character)

mat.subset<-list()
for (i in seq(nrow(mat)))
{
  mat.temp<-mat
  j=1
  mat.subset[[i]]<-mat[i,]
  rem.row<-sort(unique(c(which(mat.temp[,1]%in%mat[i,1:3]),which(mat.temp[,2]%in%mat[i,1:3]),which(mat.temp[,3]%in%mat[i,1:3]))))
  mat.temp<-mat.temp[-rem.row,]
  while (j<=nrow(mat.temp))
  {
    if(!length(intersect(mat.temp[j,1:3],unlist(mat.subset[[i]]))))
    {
      mat.subset[[i]]<-rbind(mat.subset[[i]],mat.temp[j,])
      rem.row<-sort(unique(c(which(mat.temp[,1]%in%mat.temp[j,1:3]),which(mat.temp[,2]%in%mat[i,1:3]),which(mat.temp[,3]%in%mat[i,1:3]))))
      mat.temp<-mat.temp[-rem.row,]
    }
    j<-j+1
  }
}
mat.subset.lengths<-unlist(lapply(mat.subset,function(x) nrow(x)))
mat.subset<-mat.subset[which(mat.subset.lengths==max(mat.subset.lengths))]

The last two steps were necessary in my case as I had, as stated above, excluded some combinations before the time consuming for loop, and only a certain number of starting points would yield a complete solution (or in the worst case close to complete solution).

If you have a hint that there are more sets than covered with this procedure or if you have a more elgant way, I would appreciate your input.

Ingo
  • 53
  • 6
  • thinking about my solution for a while made me believe that I don't get all possible sets in this way. In the example above `mat.subset` has the length of 120. I fear there are more possible sets of combinations as for each starting point only one "path" is sampled, while there are surely many more . I would be delighted if someone finds a) this to be true and b) a way to explore all these. – Ingo Dec 04 '15 at 12:45
  • So indeed, this solution seems to disregard "some" sets... Someone on a math forum developed a formula ( P(m,n*[m/n])/(n!^[m/n] * [m/n]!) ) to calculate the number of possibel sets, which would give 2800 sets in the C(10,3) case... help, anyone? – Ingo Dec 07 '15 at 15:39