I have a large dataset (~188000 rows), I want to calculate the distance between my rows so I can then apply the hclust
function to determine the centers of my dataset and later apply the kmeans
function to classify my data.
My problem is with the first step which is calculating my matrix distance: using the function dist
from the package stats
gave me this error:
Error: cannot allocate vector of size 132.0 Gb
It's clear that it's a RAM problem.
I need to find another way to calculate my distance matrix.
Any clear answer would be so helpful for me.