3

I was trying to use outer() function in R to create a matrix by pairwise evaluation of elements in a vector of dimension n. Specifically, let x be n-dimensional vector and I want to compare each pair of the elements of x. To do so, I use the following naive implementation using outer() function.

# these codes are example

n <- 500
x <- rnorm(n) 
f <- function(x, y){
  as.numeric(x<y)+0.5*as.numeric(x==y)
}
#new.mat <- outer(seq_len(n), seq_len(n), f) this was posted wrongly
new.mat <- outer(x, x, f)  # edited

This implementation is extremely slow when n increases, and I would like to know an efficient way of doing this job. I really appreciate if you introduce me to your trick.

Thanks,
Alemu

  • This is more appropriate for Stack Overflow. BTW - a few R packages contain functions for outer() written in native (C/C++) code, so they are guaranteed to be fast. See the function 'Outer()' in the 'Rfast' package for example. – compbiostats Jul 24 '19 at 18:31
  • [This answer](https://stackoverflow.com/a/50776685/5793905) was for a question in a different context, but it applies to your problem. Check in particular the part about "using custom distances"; you can register your R function with `proxy` and let it do the looping. The loop is evaluated in C by `proxy`, though it still has the overhead of evaluating the R code in your functions, but it might be considerably faster than `outer`. – Alexis Jul 24 '19 at 21:53
  • Another approach is to vectorize your function `f`. Given the demo-code you've provided, you could do this and pass a single `x` and the entire vector for `y`. However, testing `sapply(x, f, y = x)` is about the same speed as `outer(x, x, f)`. – Brian Jul 24 '19 at 22:32
  • 2
    @Brian OP's function already is vectorized. And `outer` does not loop. The problem is probably that outer creates an input matrix of dimension (n², 2) and an output matrix of dimension (n, n). If `n` is large these matrices are huge and memory management will be slow. – Roland Jul 25 '19 at 04:38
  • @Roland, that was my point, the function is already usable for vectors. But maybe my mental model of `outer` vs `sapply` is wrong: I thought `outer` would perform n^2 calls to `f`, for each single matrix element; vs `sapply` would perform n calls to `f`, each of which had `n`-length vector elements. – Brian Jul 25 '19 at 11:10
  • Correction in my example. outer(x, x, f) – Alemu Assefa Jul 25 '19 at 12:58

0 Answers0