0

I wanna calculate Euclidean distances between each pairs of elements in a two dimensional array list in JAVA. this two dimensional array list consists of 40000 records in 40 dimensions. I encountered a memory deficiency problem:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

I increased the heap-memory size to: Xmx16000M (16Gb RAM). but the problem also exists. so, how can I get rid of out of memory problem? In the following you can see the pseudocode that exactly describe my code. Thank you all of the respondents.

ArrayList<ArrayList<Double> dataset = new ArrayList<ArrayList<Double>>();
dataset = readDataset(); // a method returns data to my 2-d arraylist
                         //now I have 40000 records in 40 dim in dataset!
distanceMatrix = new double[dataset.size()][dataset.size()];

for (int i=0 ; i<dataset.size(); i++) {
    for (int j=0 ; j<(dataset.size()-i); j++) {
        if (i == j) {
            distanceMatrix[i][j] = 0.0;
            continue;
        }
        double ans= getDistance(dataset.get(i), dataset.get(j));
        distanceMatrix[i][j] = ans;
        distanceMatrix[j][i] = ans;
    }
}

public double getDistance(ArrayList<Double> a , ArrayList<Double> b) {
    double dist=0;
    for (int i = 0; i < 40; i++) { 
        double c = Math.abs(a.get(i) - b.get(i));
        dist += Math.pow(c, 2);
    }
    dist = Math.sqrt(dist);
    return dist;
}
  • 4
    You have 40000x40000 distances each as a double with 8 bytes. That alone is around 12 GB. I think the best solution would be to not keep the results in memory. Instead write them directly in a file or a database. – mayamar Jan 13 '19 at 08:06
  • 2
    Without a heap dump taken at the time of the OutOfMemoryError, we can only speculate. And 9 times out of 10, when I speculate, I am wrong. – Joe C Jan 13 '19 at 08:14
  • Isn’t a square matrix very inefficient here since half the matrix is just a mirror of the other half with a diagonal lines of 0’s between them? – Joakim Danielson Jan 13 '19 at 08:57
  • If you really need to know the distance between *all* pairs, the best you can do is either to store them all (perhaps exploiting symmetry), or calculate them on-the-fly. But you may not need to know all pairs, as pairs of elements that are "far away" could be uninteresting; this strongly depends upon the problem you are trying to solve. – Andy Turner Jan 13 '19 at 09:09
  • @JoeC *And 9 times out of 10, when I speculate, I am wrong.* Nah, that's speculation. It's rather 8 times out of 10. Hehe. On topic: The matrix is symmetric, so storing only half of it could be done with ~6.4GB. Using `float` instead of `double` should be fine for must use cases, bringing it down to ~3.2 GB. Doing dirty tricks could allow you to squeeze a "low-precision `float`" into a (16bit) `short`, yielding ~1.6 GB. Details depend on the goal, as Andy already said. – Marco13 Jan 13 '19 at 13:21

0 Answers0