I have a dataframe with binary values like so:
df<-data.frame(a=rep(c(1,0),9),b=rep(c(0,1,0),6),c=rep(c(0,1),9))
Purpose is to first obtain all pairwise combinations :
combos <- function(df, n) {
unlist(lapply(n, function(x) combn(df, x, simplify=F)), recursive=F)
}
combos(df,2)->j
Next I want to get the proportion of pairs for which both columns in each dataframe in list j has either (0,0) or (1,1). I can get the proportions like so:
lapply(j, function(x) data.frame(new = rowSums(x[,1:2])))->k
lapply(k, function(x) data.frame(prop1 = length(which(x==1))/18,prop2=length(which(x==0|x==2))/18))
However this seems slow and complicated for larger lists. Couple of questions:
1) Is there a faster/better method than this? My actual list is 20 dataframes each with dim : 250 x 400. I tried dist(df,method=binary)
but it looks like the binary method doesnot take into account (0,0) instances.
2) Also why when I try to divide using length(x[1]) or lengths(x[1]) it does not give me 18? In the example I divided it by specifying the length of vector new
.
Any help is very much appreciated!