0

I am working on R to compute the distance matrix for a large matrix. Matrix has 39900 rows and 1990 columns:

set.seed(123)
#Matrix
M <- matrix(rnorm(39900*1990),nrow = 39900,ncol = 1990)

The issue appears when I want to compute the distance matrix:

#Distance
d <- dist(M,method = 'euclidean')

Having a computer with icore3 processor and 8GB ram using R 64 bits, it has elapsed more than 24 hours and the matrix has not been computed yet.

Is there any way to boost the computing maybe using Rcpp or other method? I need to obtain the distance matrix and other solutions in this site have not contributed to solve the problem.

Duck
  • 39,058
  • 13
  • 42
  • 84

1 Answers1

4

Perhaps try the distances package: https://cran.r-project.org/web/packages/distances/distances.pdf

install.packages("distances")
library("distances")
set.seed(123)
M <- matrix(rnorm(39900*1990),nrow = 39900,ncol = 1990)
d <- distances(M)
Alex Reynolds
  • 95,983
  • 54
  • 240
  • 345
  • This worked amazingly! Just curious the resulting object can be fed into a clustering algorithm like `hclust()`? – Duck Aug 26 '21 at 23:25
  • 1
    From the documentation, you get a `distances` object from calling `distances::distances()`. It looks like you can pass a `distances` object to `distances::distance_matrix()` and get a `dist` object as output, which `hclust` can consume. – Alex Reynolds Oct 20 '21 at 23:14