1

I'm trying to perform the dot product on all possible combinations of vectors. I am able to find all the possible combinations. I just can't quite figure out how the FUN argument in combn() works. Below is my code, thanks for any help!

def=c("Normal.def","Fire.def","Water.def","Electric.def","Grass.def","Ice.def",
       "Fighting.def","Poison.def","Ground.def","Flying.def","Pyschic.def","Bug.def",
       "Rock.def","Ghost.def","Dragon.def","Null.def")

combn(def,2,FUN=def%*%def,simplify=TRUE)
Stedy
  • 7,359
  • 14
  • 57
  • 77
Brent Ferrier
  • 312
  • 1
  • 4
  • 18
  • 1
    Can you post some sample input and output of what you hope to do? It's possible that `combn` isn't the best function for your task to begin with.... – A5C1D2H2I1M1N2O1R2T1 Apr 17 '14 at 18:11

2 Answers2

5

Using @BrodieG's sample data, you can just use the crossprod function:

set.seed(1)
vec1 <- sample(1:10)
vec2 <- sample(1:10)
vec3 <- sample(1:10)

crossprod(cbind(vec1, vec2, vec3))
#      vec1 vec2 vec3
# vec1  385  298  284
# vec2  298  385  296
# vec3  284  296  385

Some benchmarks, out of curiosity:

The functions to run:

fun1 <- function() {
  A <- crossprod(do.call(cbind, lst))
  A[upper.tri(A)]
} 
fun2 <- function() {
  A <- do.call(rbind, lst) %*% do.call(cbind, lst)
  A[upper.tri(A)]
} 
fun3 <- function() {
  combn(
    seq_along(lst), 2, 
    FUN=function(idx) c(lst[[idx[[1]]]] %*% lst[[idx[[2]]]])
  )
}

Benchmarking on "small number of large vectors".

library(microbenchmark)

set.seed(1)
n <- 5
lst <- setNames(replicate(n, sample(1:100000), simplify = FALSE), 
                paste0("V", sequence(n)))

microbenchmark(fun1(), fun2(), fun3())
# Unit: milliseconds
#    expr       min        lq    median        uq      max neval
#  fun1()  6.909651  6.992031  8.432346  8.520301 74.12263   100
#  fun2() 17.290101 18.811134 19.144601 21.292544 88.10602   100
#  fun3() 22.841209 24.283113 24.427876 25.820158 91.14007   100

Not being patient enough to benchmark on medium numbers of medium vectors:

set.seed(1)
n <- 1000
lst <- setNames(replicate(n, sample(1:1000), simplify = FALSE), 
                paste0("V", sequence(n)))

system.time(fun1())
#   user  system elapsed 
#  0.245   0.004   0.251 

system.time(fun2())
#   user  system elapsed 
#  0.407   0.016   0.425 

system.time(fun3())
#   user  system elapsed 
# 14.216   0.004  14.339 
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • `crossprod` seems to come consistently on top, but I was able to get `combn` faster than `t(mx) %*% mx` (where `mx == cbind(...)`) with three vectors 1e6 long each. I'm surprised there is so much of a difference between `crossprod` and `t(mx) %*% mx` since they do the same thing and the latter is pretty much all internal c code; also, the transpose only accounts for a relatively small portion of the difference between the two. – BrodieG Apr 17 '14 at 18:56
  • Actually, from `?crossprod`, as you probably saw: This is formally equivalent to (but usually slightly faster than) the call t(x) %*% y (crossprod) or x %*% t(y) (tcrossprod). But the difference is actually substantial. – BrodieG Apr 17 '14 at 19:00
  • @BrodieG, Unfortunately, one thing I don't like about the help pages in R is there is no timeline. While we might bump up version numbers of packages and so on, that doesn't mean the documentation changes nor that all the functions changed. Who knows when or under what conditions that documentation was written, or when (if at all) the function was made faster. – A5C1D2H2I1M1N2O1R2T1 Apr 17 '14 at 19:04
  • Aha, `crossprod(mx, mx)` is twice as slow as `crossprod(mx)` (at least in the 3 column example I referenced). They must have been referencing the first use case when comparing to `t(mx) %*% mx`. But yes, I too often find lack of detail in documentation exasperating. – BrodieG Apr 17 '14 at 19:13
  • Is there a way to adopt this to do the dot product of 12 vectors and not just 2? – Brent Ferrier Apr 17 '14 at 22:49
  • @user3453510, I don't really understand your comment. The answer above shows it being done 3 vectors, a list of 5 vectors, and a list of 1000 vectors! – A5C1D2H2I1M1N2O1R2T1 Apr 18 '14 at 01:52
  • Here is a link to the project I have been working on in much more detail: http://stackoverflow.com/questions/23156549/dot-product-of-multiple-vectors-in-r-to-optimize-pokemon-teams – Brent Ferrier Apr 18 '14 at 14:51
3

Why don't you just matrix multiply the whole thing. For example:

set.seed(1)
vec1 <- sample(1:10)
vec2 <- sample(1:10)
vec3 <- sample(1:10)

rbind(vec1, vec2, vec3) %*% cbind(vec1, vec2, vec3)

produces:

     vec1 vec2 vec3
vec1  385  298  284
vec2  298  385  296
vec3  284  296  385

Where each cell of a matrix is the dot product of the two vectors in the col and row labels. Alternatively, if you really want to do it with combn:

vec.lst <- list(vec1, vec2, vec3)
combn(
  seq_along(vec.lst), 2, 
  FUN=function(idx) c(vec.lst[[idx[[1]]]] %*% vec.lst[[idx[[2]]]])
)

Which produces:

[1] 298 284 296

Notice how those numbers correspond to the upper triangle of the matrix. For small data sets the matrix multiply approach is much faster. For large ones, particularly ones were the vectors are very large but there aren't that many of them, the combn approach might be faster since it doesn't run as many computations (only the upper triangle basically).

BrodieG
  • 51,669
  • 9
  • 93
  • 146
  • I haven't been able to come up with sample data where `combn` comes out remotely on top. `crossprod` takes the first place, but the `cbind`/`rbind` approach you propose is efficient too. – A5C1D2H2I1M1N2O1R2T1 Apr 17 '14 at 18:39