1

I have data in a dataframe in the following format:

id a b c d e f x y z
1  0 0 0 1 1 0 1 0 1 
2  1 0 0 1 0 0 1 0 0 

What I would like to do is find out how many times a and x have a 1 in the same place, a and y have a 1 in the same row and so on. Essentially, (a,b,c,d,e,f) and (x,y,z) ought to be grouped as two variables.

I have been trying to use the reshape package (using melt and cast ) to do this and the best I could come up with was something which looked like the following:

id a b c d e f x y z
1  1 0 0 0 1 0 10 5 3 
2  0 1 0 1 0 0 25 0 48

What I want to see is something like this:

  x    y    z
a 10   5    3
b 25   0    48

I really would appreciate your help on this. I am a newbie to the reshape package.

Regards,

Arun

3 Answers3

2

If your data is in data frame data, you could do:

mdata <- melt(data, measure.vars=c("a","b","c","d","e","f"))
mmdata <- melt(mdata, measure.vars=c("x","y","z"))
colnames(mmdata) <- c("var1","val1","var2","val2")
mmdata <- mmdata[mmdata$val1 & mmdata$val2,]
cast(mmdata, var1~var2)
danas.zuokas
  • 4,551
  • 4
  • 29
  • 39
1

This may be a solution, but there might be something simpler or cleaner. It's using a double sapply, which I didn't think I would use one day :-) :

Here are your data :

d <- data.frame(a=c(1,0,1),
                b=c(1,0,0),
                c=c(0,1,1),
                x=c(1,1,1),
                y=c(1,0,0),
                z=c(1,0,1))

First we split the data between the two groups of variables :

d1 <- d[,c("a","b","c")]
d2 <- d[,c("x","y","z")]

And here is the code :

tmptab <- function(v1,v2) {
  tab <- as.data.frame(table(v1,v2))
  result <- tab$Freq[tab$v1==1 & tab$v2==1]
  if (is.na(result)) result <- 0
  return(result)
}

sapply(d2, function(v) {
  sapply(d1, tmptab, v)
})

Which gives :

  x y z
a 2 1 2
b 1 1 1
c 2 0 1
juba
  • 47,631
  • 14
  • 113
  • 118
0

Digging up skeletons....

Here's a possible approach. Not sure if it's any improvement over the others though...

## Save ourselves some typing...
Group1 <- c("a", "b", "c", "d", "e")
Group2 <- c("x", "y", "z")

## melt the data
A <- melt(mydf, id.vars="id")

## Split and merge
B <- Reduce(function(x, y) merge(x, y, by = "id"), 
            split(A, A$variable %in% Group1))

## Subset
B.sub <- B[B$value.x == 1 & B$value.y == 1, c("variable.y", "variable.x")]

## Factor (so table works nicely)
## Without `factor`, rows or columns which are all zeroes
##   will be dropped during the tabulation, which may be fine
B.sub$variable.y <- factor(B.sub$variable.y, Group1)
B.sub$variable.x <- factor(B.sub$variable.x, Group2)

## Tabulate
table(B.sub)
#           variable.x
# variable.y x y z
#          a 1 0 0
#          b 0 0 0
#          c 0 0 0
#          d 2 0 1
#          e 1 0 1
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485