55

How do I find the Euclidean distance of two vectors:

x1 <- rnorm(30)
x2 <- rnorm(30)
zx8754
  • 52,746
  • 12
  • 114
  • 209
Jana
  • 1,523
  • 3
  • 14
  • 17

5 Answers5

72

Use the dist() function, but you need to form a matrix from the two inputs for the first argument to dist():

dist(rbind(x1, x2))

For the input in the OP's question we get:

> dist(rbind(x1, x2))
        x1
x2 7.94821

a single value that is the Euclidean distance between x1 and x2.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • 7
    Shouldn't I get a single distance measure as answer? you soultion gives me a matrix. – Jana Apr 05 '11 at 22:14
  • With the above sample data, the result is a single value. The comment asking for "a single distance measure" may have resulted from using a different data structure?! – BurninLeo Mar 12 '19 at 09:53
  • @Jana I have no idea how you are getting a matrix back from `dist()` if `x1` and `x2` are just two vectors as per the OP. Did you use `cbind()`? – Gavin Simpson Mar 12 '19 at 17:34
46

As defined on Wikipedia, this should do it.

euc.dist <- function(x1, x2) sqrt(sum((x1 - x2) ^ 2))

There's also the rdist function in the fields package that may be useful. See here.


EDIT: Changed ** operator to ^. Thanks, Gavin.

Omar Wagih
  • 8,504
  • 7
  • 59
  • 75
Erik Shilts
  • 4,389
  • 2
  • 26
  • 51
  • 2
    In my Ubuntu box `dist(rbind(x1,x2))` is three times faster. – JohnTortugo Oct 26 '13 at 22:40
  • 5
    I just tried this on R 3.0.2 on Ubuntu, and this method is about 12 times faster for me than the `dist(rbind())` method. (Testing with `system.time({a <- c(2,6,78); b <- c(4,6,2); for (i in 1:1000000) {dist(rbind(a,b))} })`) – naught101 Sep 15 '14 at 02:03
17

try using this:

sqrt(sum((x1-x2)^2))
Chase
  • 67,710
  • 18
  • 144
  • 161
so12311
  • 4,179
  • 1
  • 29
  • 37
3

If you want to use less code, you can also use the norm in the stats package (the 'F' stands for Forbenius, which is the Euclidean norm):

norm(matrix(x1-x2), 'F')

While this may look a bit neater, it's not faster. Indeed, a quick test on very large vectors shows little difference, though so12311's method is slightly faster. We first define:

set.seed(1234)
x1 <- rnorm(300000000)
x2 <- rnorm(300000000)

Then testing for time yields the following:

> system.time(a<-sqrt(sum((x1-x2)^2)))
user  system elapsed 
1.02    0.12    1.18 
> system.time(b<-norm(matrix(x1-x2), 'F'))
user  system elapsed 
0.97    0.33    1.31 
JJJ
  • 1,009
  • 6
  • 19
  • 31
0

If you need to quickly calculate the Euclidean distance between one vector and a matrix of many vectors, then you can use the tcrossprod method from this answer:

bench=function(...,n=1,r=3){
  a=match.call(expand.dots=F)$...
  t=matrix(ncol=length(a),nrow=n)
  for(i in 1:length(a))for(j in 1:n){t1=Sys.time();eval(a[[i]],parent.frame());t[j,i]=Sys.time()-t1}
  o=t(apply(t,2,function(x)c(median(x),min(x),max(x),mean(x))))
  round(100*`dimnames<-`(o,list(names(a),c("median","min","max","mean"))),r)
}

es=3:6
r=sapply(es,function(e){
  m=matrix(rnorm(10^e),ncol=10)
  v=rnorm(10)
  bench(n=10,
    tcrossprod={sqrt(outer(rowSums(m^2),rowSums(t(v)^2),"+")-tcrossprod(m,2*t(v)))},
    Rfast_dista={Rfast::dista(m,t(v))},
    vectorized={sapply(colSums((v-t(m))^2),sqrt)},
    regular={apply(m,1,function(x)sqrt(sum((v-x)^2)))},
    dotproduct={apply(m,1,function(x){q=v-x;sqrt(q%*%q)})},
    norm={apply((v-t(m)),2,function(x)norm(as.matrix(x),"F"))},
    rbind_single={apply(m,1,function(x)dist(rbind(v,x))[1])},
    rbind_all={if(e<=5)unname(as.matrix(dist(rbind(v,m)))[1,-1])})[,1]
})

colnames(r)=paste0("1e",es)
r[8,4]=NA
round(r,3)

Output:

               1e3   1e4     1e5     1e6
tcrossprod   0.004 0.012   0.093   0.972
Rfast_dista  0.003 0.013   0.115   1.148
vectorized   0.006 0.038   0.366   3.835
regular      0.021 0.181   1.901  20.437
dotproduct   0.020 0.183   2.010  24.560
norm         0.057 0.562   6.017  62.532
rbind_single 0.159 1.592  16.982 181.834
rbind_all    0.036 3.493 530.259      NA
nisetama
  • 7,764
  • 1
  • 34
  • 21