2

I have following type of data, means combination of factors

P1 <- c("a", "a", "a", "a", "b", "b", "b", "c", "c", "d")
P2 <- c("a", "b", "c", "d", "b", "c", "d", "c", "d", "d")
myfactors <- data.frame(P1, P2)

   P1 P2
1   a  a
2   a  b
3   a  c
4   a  d
5   b  b
6   b  c
7   b  d
8   c  c
9   c  d
10  d  d

In real word the factors might be any number, I am trying write a function that can be applicable to any level of the factors. I want to set contrasts all combinations available in the data set. for example in this data set a-b, a-c,a-d, b-c,b-d, c-d. The contrast rule here.

for example for "a-b" is if P1 = P2 = a or b the coefficient = -1, 
if P1=a, P2= b or P1= b, P2 = a, the coefficient = 2,
   else coefficient = 0

The output coefficient matrix will like the following:

P1  P2  a-b a-c a-d b-c b-d c-d
a   a   -1  -1  -1  0   0   0
a   b   2   0   0   0   0   0
a   c   0   2   0   0   0   0
a   d   0   0   2   0   0   0
b   b   1   0   0   -1  -1  0
b   c   0   0   0   2   0   0
b   d   0   0   0   0   2   0
c   c   0   1   0   0   0   -1
c   d   0   0   0   -1  0   2
d   d   0   0   -1  0   -1  -1

As the function I am thinking is flexible one, if I will apply to the following dataset,

P1 <- c("CI", "CI", "CI", "CD", "CD", "CK", "CK")
P2 <- c("CI", "CD", "CK", "CD", "CK", "CK", "CI")
 mydf2 <- data.frame(P1, P2)
 mydf2
  P1 P2
1 CI CI
2 CI CD
3 CI CK
4 CD CD
5 CD CK
6 CK CK
7 CK CI

The expected coefficient matrix for this dataframe is:

P1  P2  CI-CD    CI-CK  CD-CK   CK-CI
CI  CI    -1      -1      0   -1
CI  CD     2       0      0    0
CI  CK     0       2      0    0
CD  CD    -1       0     -1    0
CD  CK     0       0      2    0
CK  CK     0      -1     -1   -1
CK  CI     0       0      0    2

I tried several ways but could not come to successful program.

EDITS:

(1) I am not testing all possible combinations, the combination that only appear in P1 and P2 are tested

(2) I intend to develop solution not only to this instance, but of general application. for example myfactors dataframe above.

jon
  • 11,186
  • 19
  • 80
  • 132

1 Answers1

5

You didn't supply a reason for your particular choice of the 6 ordered combinations of P1 and P2 values, so I just ran through them all:

combos <- cbind( combn(unique(c(P2, P1)), 2), combn(unique(c(P2, P1)), 2)[2:1, ])
combos
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,] "CI" "CI" "CD" "CD" "CK" "CK"
[2,] "CD" "CK" "CK" "CI" "CI" "CD"

As I went through the logic it seemed more compact to test for conditions 1) and 2) and just use Boolean math to return the results. If both conditins are untrue you get 0. I've check the entries that do not match yours and I think your construction was wrong in spots. You have 0 in the "CI-CK" row 7 and I think the answer by your rules should be 2.:

sapply(1:ncol(combos), function(x) with( mydf2,  
      2*( (P1==combos[1,x] & P2 == combos[2,x]) | (P2==combos[1,x] & P1 == combos[2,x])) - 
       (P1 == P2 & P1 %in% combos[,x]) ) )
#---------------
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]   -1   -1    0   -1   -1    0
[2,]    2    0    0    2    0    0
[3,]    0    2    0    0    2    0
[4,]   -1    0   -1   -1    0   -1
[5,]    0    0    2    0    0    2
[6,]    0   -1   -1    0   -1   -1
[7,]    0    2    0    0    2    0

#------------------
 mydf2[ , 3:8] <- sapply(1:ncol(combos), function(x) with( mydf2,  
      2*( (P1==combos[1,x] & P2 == combos[2,x]) | (P2==combos[1,x] & P1 == combos[2,x])) - 
       (P1 == P2 & P1 %in% combos[,x]) ) )
 mydf2
 #-----------------
  P1 P2 CI-CD CI-CK CD-CK CD-CI CK-CI CK-CD
1 CI CI    -1    -1     0    -1    -1     0
2 CI CD     2     0     0     2     0     0
3 CI CK     0     2     0     0     2     0
4 CD CD    -1     0    -1    -1     0    -1
5 CD CK     0     0     2     0     0     2
6 CK CK     0    -1    -1     0    -1    -1
7 CK CI     0     2     0     0     2     0
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • excellent, thank you....two issues (1) regarding your question why not all possible combination ? I do not need all possible combination between these two factors (as all combinations are not tested - in this case the tested combination, P1 and P2 are identical, are only provided, that is why CK-CD comparison you got no "2" as there no CK-CD combination tested (b) combochk <- data.frame(mydf2, list(NA,NA, NA, NA,NA, NA)), how can we automate number of NAs, as I want this function to has general application (for example myfactors df is different), here we know exactly 6 combinations tested – jon Nov 06 '11 at 16:44
  • I latter realized that the `combochk <- data.frame(mydf2, list(NA,NA, NA, NA,NA, NA))` code was not needed. As long as you set up the `combos` matrix to your liking, you can just use `mydf2[, 3:(ncol(combos)+2)] <- sapply(.....)`. I edited the code to make it less kludgy. – IRTFM Nov 06 '11 at 16:49
  • you are right CK-CI or CI-CK should be 2, if I apply the reciprocal rule ...but I was previously mistaken that I do want a reciprocal rule.. by removing "or P1= b, P2 = a" condition out of it...not realized previously ...thanks – jon Nov 06 '11 at 17:10