When working with DNA, we often need the triangular p-distance matrix, which contains the proportion of non-identical sites between pairs of sequences. Thus:
- AGGTT
- AGCTA
- AGGTA
Yields:
1 2
2 0.4
3 0.2 0.2
The p-distance calculation is available in certain R packages, but suppose I need to use numerical code (-1,0,1,2), rather than letters (C,T,A,G). How do I generate the triangular p-distance matrix from "my.matrix"?
# Define DNA matrix dimensions
bp = 5 # DNA matrix length
n = 3 # DNA matrix height
# Build Binary Matrices
purine <- matrix(sample(0:1,(bp*n),replace=TRUE,prob=c(0.5,0.5)),n,bp)
ketone <- matrix(sample(0:1,(bp*n),replace=TRUE,prob=c(0.5,0.5)),n,bp)
strong <- 1-(abs(purine-ketone))
my.matrix <- (purine*strong-ketone)+(purine*ketone-strong)+purine+ketone
my.matrix