I am seeking some help regarding combinations in R.
Simplified, I have a group consisting of three individuals (I) and some markers (M) I am interested in studying. Each individual will either be positive or negative for each marker. Some markers will be positive in two individuals from the group and some will be positive in three individuals in the group.
I am interested in finding all combinations of markers present between each member of the group in such a way that each marker is studied only once in any particular grouping.
# three possible individuals in the group
I <- c("I1","I2","I3")
# 8 possible markers in the group
M <- paste0("M", seq(1,8))
# each marker can be either present (TRUE) or absent (FALSE)
# in general, more markers are present than absent
# this is random data for the purpose of example
P <- sample(c(rep(TRUE, 16),rep(FALSE, 8)))
# the input data looks like this
d <- data.frame(I=rep(I, each=8), M=rep(M, 3), P=P)
I M P
1 I1 M1 TRUE
2 I1 M2 FALSE
3 I1 M3 TRUE
4 I1 M4 FALSE
5 I1 M5 FALSE
6 I1 M6 TRUE
My preferred output would be a long data frame like:
Option I M
1 1 1
1 1 2
1 2 3
2 1 2
2 2 1
2 2 3
Each option is a unique distribution of positive markers between each of the three members of the group. This is the equivalent of a wide data frame like:
Option I1 I2 I3
1 M1, M2 M3
2 M2 M1, M3
3
The key challenges are that (i) all markers must be represented in each option and (ii) each marker should be studied only once (in one individual from the group) in each option. All individuals do not have to be represented in each option.
I suspect the solution will consist of the following key steps:
- Generate all possible combinations of markers distributed to each of the three group members without duplication of the markers between individuals.
- Remove combinations where the marker being tested is not present (TRUE) in the particular individual to which it is allocated.
- Remove combinations where there is duplication of markers within the combination if not already successfully achieved in step one.
I've really struggled with this and have spent the whole day writing a very complex, loop based approach using grid.expand and combn which has been unsuccessful. It is too complicated and shambolic to include here. Any help appreciated.