5

I am trying to get a frequency table from this dataframe:

tmp2 <- structure(list(a1 = c(1L, 0L, 0L), a2 = c(1L, 0L, 1L),
                       a3 = c(0L, 1L, 0L), b1 = c(1L, 0L, 1L),
                       b2 = c(1L, 0L, 0L), b3 = c(0L, 1L, 1L)),
                       .Names = c("a1", "a2", "a3", "b1", "b2", "b3"),
                       class = "data.frame", row.names = c(NA, -3L))


tmp2 <- read.csv("tmp2.csv", sep=";")
tmp2
> tmp2
  a1 a2 a3 b1 b2 b3
1  1  1  0  1  1  0
2  0  0  1  0  0  1
3  0  1  0  1  0  1

I try to get a frequency table as follow:

table(tmp2[,1:3], tmp2[,4:6])

But I get :

Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

Expected output:

enter image description here

Info: It is not necessary a square matrix for instance I should be able to add b4 b5 and keep a1 a2 a3

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
S12000
  • 3,345
  • 12
  • 35
  • 51

3 Answers3

5

An option:

matrix(colSums(tmp2[,rep(1:3,3)] & tmp2[,rep(4:6,each=3)]),
       ncol=3,nrow=3,
       dimnames=list(colnames(tmp2)[1:3],colnames(tmp2)[4:6]))
#   b1 b2 b3
#a1  1  1  0
#a2  2  1  1
#a3  0  0  1

If you have a different number of a and b columns, you can try:

acols<-1:3 #state the indices of the a columns
bcols<-4:6 #same for b; if you add a column this should be 4:7
matrix(colSums(tmp2[,rep(acols,length(bcols))] & tmp2[,rep(bcols,each=length(acols))]),
           ncol=length(bcols),nrow=length(acols),
           dimnames=list(colnames(tmp2)[acols],colnames(tmp2)[bcols]))
nicola
  • 24,005
  • 3
  • 35
  • 56
  • Hello thanks it's interesting. I have a question. Is that going to work if I have for instance a1 a2 a3 and b1 b2 b3 b4 ? (That's to say adding b4) ? – S12000 Apr 13 '16 at 12:43
1

Here's a possible solution :

aIdxs <- 1:3
bIdxs <- 4:7

# init matrix
m <- matrix(0,
            nrow = length(aIdxs), ncol=length(bIdxs),
            dimnames = list(colnames(tmp2)[aIdxs],colnames(tmp2)[bIdxs]))

# create all combinations of a's and b's column indexes
idxs <- expand.grid(aIdxs,bIdxs)

# for each line and for each combination we add 1
# to the matrix if both a and b column are 1 
for(r in 1:nrow(tmp2)){
  m <- m + matrix(apply(idxs,1,function(x){ all(tmp2[r,x]==1) }),
                  nrow=length(aIdxs), byrow=FALSE)
}
> m
   b1 b2 b3
a1  1  1  0
a2  2  1  1
a3  0  0  1
digEmAll
  • 56,430
  • 9
  • 115
  • 140
0

An another possible solution here. Your input is a bit tricky for 'table', as you inherently have two sets 'a' and 'b' with binary indicators in each row indicating pairwise instances only between 'a' and 'b', and you want to loop over them. Below is a generalized (but maybe not so elegant) function that would work with different length 'a's and 'b's:

tmp2 <- structure(list(a1 = c(1L, 0L, 0L), a2 = c(1L, 0L, 1L), a3 = c(0L, 
                                                              1L, 0L), b1 = c(1L, 0L, 1L), b2 = c(1L, 0L, 0L), b3 = c(0L, 1L, 
                                                                                                                      1L)), .Names = c("a1", "a2", "a3", "b1", "b2", "b3"), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                -3L))                                                                                                                                                                                                               
fun = function(x) t(do.call("cbind", lapply(x[,grep("a", colnames(x))], 
    function(p) rowSums(do.call("rbind", lapply(x[,grep("b", colnames(x))], 
    function(q) q*p ))))))
fun(tmp2)
#> fun(tmp2)
#   b1 b2 b3
#a1  1  1  0
#a2  2  1  1
#a3  0  0  1

# let's do a bigger example
set.seed(1)
m = matrix(rbinom(size=1, n=50, prob=0.75), ncol=10, dimnames=list(paste("instance_", 1:5, sep=""), c(paste("a",1:4,sep=""), paste("b",1:6,sep=""))))

# Notice that the count of possible a and b elements are not equal
#> m
#           a1 a2 a3 a4 b1 b2 b3 b4 b5 b6
#instance_1  1  0  1  1  0  1  1  1  0  0
#instance_2  1  0  1  1  1  1  1  0  1  1
#instance_3  1  1  1  0  1  1  1  1  0  1
#instance_4  0  1  1  1  1  0  1  1  1  1
#instance_5  1  1  0  0  1  1  0  1  1  1

fun(as.data.frame(m))
#> fun(as.data.frame(m))
#   b1 b2 b3 b4 b5 b6
#a1  3  4  3  3  2  3
#a2  3  2  2  3  2  3
#a3  3  3  4  3  2  3
#a4  2  2  3  2  2  2
Teemu Daniel Laajala
  • 2,316
  • 1
  • 26
  • 37