2

I have two matrices:

first.matrix  = matrix(c(1,1,2,3,3,2,4,1), nrow=2)
second.matrix = matrix(c(2,2,3,3,1,1,4,1), nrow=2)

and I want to find the correlation between the rows in 'first.matrix' and the rows in second.matrix. In Java, I would write something like this:

results <- data.frame("first"=c(1:4),"second"=c(1:4), "cor"=c(1:4))
counter <- 1
for(i in 1:2) {
  a <- c(t(first.matrix[i,]))
  for(j in 1:2) {
    b <- c(t(second.matrix[j,]))    
    results$cor[counter] <- cor(a,b)
    results$first[counter]  <- i
    results$second[counter] <- j
    counter=counter+1
  }  
}

I'm trying to teach myself to code in R in the "right" way, and have spent time reading R-tutorials and questions here on Stack Overflow. So I know my solution requires the use of apply, but try as I can, I don't understand how to actually write it. All the examples I see are pretty simple and involve finding the sum or the mean of a column or row. The trouble is:

a. I need an 'apply' that can accommodate a function that receives two variables, a row from each matrix. This requires a little manipulation to retrieve the rows. I can solve this with:

c(t(first.matrix[i,]))

but I don't know how to insert that to the 'apply'

b. I need results the tell me what row from the first matrix was compared with what row in the second matrix, and what the result is. In my example, running the code I wrote will result in:

 first second        cor
1     1      1  0.4000000
2     1      2 -0.6741999
3     2      1 -0.1348400
4     2      2  0.6363636

I don't care if the columns have names or not.

Any solution, hint, or reference will be really appreciated :-)

smci
  • 32,567
  • 20
  • 113
  • 146
nafrtiti
  • 176
  • 8
  • 3
    `cor(t(first.matrix), t(second.matrix))`, and then `reshape2::melt(cor(t(first.matrix), t(second.matrix)))` – user20650 Mar 19 '17 at 20:04
  • wow!! that's so elegant! If you write it as a solution I'll accept it. I had not idea that 'cor' would perform on a transposed matrix. Now I'm curious: what is the intuition behind that? How was I supposed to know that? I tried 'cor' straight on the matrix and that didn't work of course. Thanks a lot! – nafrtiti Mar 20 '17 at 06:56
  • The general term for "use '*apply' family instead of loop/nested loops" is [tag:vectorization]. Please use that tag on this sort of question. And skim the existing questions. The general philosophy in R is that most functions should allow a vector/matrix argument instead of a single number, where it makes sense. – smci Mar 20 '17 at 13:35
  • @smci - I read (not skimmed, took the time to read) all questions I found on `apply`, and I read some tutorials as well. I also read the answers, the comments and the comments on the comments, and there almost always appears a comment such as yours. I did my best to ask the question in the "right way". Note that my "java-like" R code in the original question does use `cor` on a vector (I know that a function allows more than a number), but it didn't occur to me that I could transpose the matrix and then call the columns as vectors. Pretty obvious, when I rethink about it now. – nafrtiti Mar 20 '17 at 18:09
  • @nafrtiti : Sure, you didn't do anything wrong, I wasn't saying you had! Just trying to make sure you know 'vectorize' is the right keyword for next time, and the R philosophy that loops are generally to be avoided, and should be replaceable with vectors/matrices, and wrapped with whatever other functional machinery you need. It took me a lot of getting used to when I originally migrated from Python. Coming from Java is an even harder paradigm shift. – smci Mar 20 '17 at 18:15

2 Answers2

2

One solution would be to first merge the # of rows of first.matrix to the # of rows of second.matrix together to get all the combinations you want. Basically this just gets you the indexes of the matrices you want. Then you could do an sapply to get the correlation.

Something like:

    res<-merge(data.frame(first=1:nrow(first.matrix)),
               data.frame(second=1:nrow(second.matrix)))
    res$corr<-sapply(1:nrow(res),function(i) {
                        cor(first.matrix[res[i,1],],second.matrix[res[i,2],])
                        })
    res

#   first second       corr
#1     1      1  0.4000000
#2     2      1 -0.1348400
#3     1      2 -0.6741999
#4     2      2  0.6363636
Mike H.
  • 13,960
  • 2
  • 29
  • 39
  • I'm embarrassed to say I don't quite understand this solution, since I don't see a definition of `rows`, so don't understand how this works. – nafrtiti Mar 20 '17 at 07:25
  • 1
    @nafrtiti that is my fault I'm sorry - I didn't update the names in the code. `rows` should have been `res`. Should work now. – Mike H. Mar 20 '17 at 12:10
  • thanks for the clarification! – nafrtiti Mar 20 '17 at 18:03
2

You could do this by combining expand.grid with apply. Use expand.grid to get a table with all the possible combinations of rows from the two matrices, then use apply to iterate your function over those combinations. Like:

apply(# get a table with all possible combinations of rows
      expand.grid(seq(nrow(first.matrix)), seq(nrow(second.matrix))),
      # apply the upcoming function row-wise
      1,
      # now run cor over those combos of row
      function(x) cor(x = first.matrix[x[1],], y = second.matrix[x[2],]))

Result:

[1]  0.4000000 -0.1348400 -0.6741999  0.6363636
ulfelder
  • 5,305
  • 1
  • 22
  • 40