28

I have a 2D matrix mat with 500 rows × 335 columns, and a data.frame dat with 120425 rows. The data.frame dat has two columns I and J, which are integers to index the row, column from mat. I would like to add the values from mat to the rows of dat.

Here is my conceptual fail:

> dat$matval <- mat[dat$I, dat$J]
Error: cannot allocate vector of length 1617278737

(I am using R 2.13.1 on Win32). Digging a bit deeper, I see that I'm misusing matrix indexing, as it appears that I'm only getting a sub-matrix of mat, and not a single-dimension array of values as I expected, i.e.:

> str(mat[dat$I[1:100], dat$J[1:100]])
 int [1:100, 1:100] 20 1 1 1 20 1 1 1 1 1 ...

I was expecting something like int [1:100] 20 1 1 1 20 1 1 1 1 1 .... What is the correct way to index a 2D matrix using indices of row, column to get the values?

Henrik
  • 65,555
  • 14
  • 143
  • 159
Mike T
  • 41,085
  • 18
  • 152
  • 203
  • +1 for an interesting question (which begs another question: why isn't there an option to change the behavior to something a little more like this when passing the `[` operator N vectors for an N-dimensional matrix?) – Ari B. Friedman Aug 03 '11 at 01:11
  • Nice question - I edited it very slightly to fix what I *think* is a typo (`datI` to `dat$I`). If this isn't what you meant feel free to undo... – joran Aug 03 '11 at 01:16

4 Answers4

43

Almost. Needs to be offered to "[" as a two column matrix:

dat$matval <- mat[ cbind(dat$I, dat$J) ] # should do it.

There is a caveat: Although this also works for dataframes, they are first coerced to matrix-class and if any are non-numeric, the entire matrix becomes the "lowest denominator" class.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 1
    +1 for finding the way that R clearly intended to do things ;-) – Ari B. Friedman Aug 03 '11 at 01:18
  • So if `I` and `J` are the only columns, is just `mat[dat]` sufficient? Or do you need to coerce to a matrix? – joran Aug 03 '11 at 01:19
  • 1
    Seems coercion is necessary since the data frame is really a list. So you could also do `as.matrix(dat)`. – joran Aug 03 '11 at 01:21
  • 3
    @gsk3: Look at the Arguments section for ?"[" under "..." . When an array or matrix is being addressed, the matrix must have the same number of columns as the addressed object has dimensions. There are also some examples on that help page. – IRTFM Aug 03 '11 at 02:42
  • What happens if the data.frame contains index values for I and J that are outside the bounds of the matrix? I'm pretty sure it will fail...I think @Tommy's answer will return NAs for that scenario. Just something to keep in mind... – Chase Aug 03 '11 at 12:19
  • This indexing method is rather obscure, not covered in "Intro to R" tutorials. I got curious and read the docs, [which does cover it](https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Index-matrices) – Heisenberg Jun 03 '16 at 22:22
11

Using a matrix to index as DWin suggests is of course much cleaner, but for some strange reason doing it manually using 1-D indices is actually slightly faster:

# Huge sample data
mat <- matrix(sin(1:1e7), ncol=1000)
dat <- data.frame(I=sample.int(nrow(mat), 1e7, rep=T), 
                  J=sample.int(ncol(mat), 1e7, rep=T))

system.time( x <- mat[cbind(dat$I, dat$J)] )     # 0.51 seconds
system.time( mat[dat$I + (dat$J-1L)*nrow(mat)] ) # 0.44 seconds

The dat$I + (dat$J-1L)*nrow(m) part turns the 2-D indices into 1-D ones. The 1L is the way to specify an integer instead of a double value. This avoids some coercions.

...I also tried gsk3's apply-based solution. It's almost 500x slower though:

system.time( apply( dat, 1, function(x,mat) mat[ x[1], x[2] ], mat=mat ) ) # 212
Tommy
  • 39,997
  • 12
  • 90
  • 85
1

Here's a one-liner using apply's row-based operations

> dat <- as.data.frame(matrix(rep(seq(4),4),ncol=2))
> colnames(dat) <- c('I','J')
> dat
   I  J
1  1  1
2  2  2
3  3  3
4  4  4
5  1  1
6  2  2
7  3  3
8  4  4
> mat <- matrix(seq(16),ncol=4)
> mat
     [,1] [,2] [,3] [,4]
[1,]    1    5    9   13
[2,]    2    6   10   14
[3,]    3    7   11   15
[4,]    4    8   12   16

> dat$K <- apply( dat, 1, function(x,mat) mat[ x[1], x[2] ], mat=mat )
> dat
  I J  K
1 1 1  1
2 2 2  6
3 3 3 11
4 4 4 16
5 1 1  1
6 2 2  6
7 3 3 11
8 4 4 16
Ari B. Friedman
  • 71,271
  • 35
  • 175
  • 235
-1
n <- 10
mat <- cor(matrix(rnorm(n*n),n,n))
ix <- matrix(NA,n*(n-1)/2,2)
k<-0
for (i in 1:(n-1)){
    for (j in (i+1):n){
    k <- k+1
    ix[k,1]<-i
    ix[k,2]<-j
    }
}
o <- rep(NA,nrow(ix))
o <- mat[ix]
out <- cbind(ix,o)