0

I have a 60000 by 300 matrix call X. I am trying to find pairwise euclidean distances. I know that the pdist function in matlab (stats toolbox) can do this. However, when I type in the code pdist(X), I get the following error message:

Error using pdistmex
Out of memory. Type HELP MEMORY for your options.

Error in pdist (line 252)
    Y = pdistmex(X',dist,additionalArg);

Any advice for fixes? Is the matrix size too big?

Oleg
  • 10,406
  • 3
  • 29
  • 57
user2521074
  • 61
  • 1
  • 5
  • 2
    You have exactly `nchoosek(6e4,2)` = 1799970000 pairwise combinations which total to 1799970000 * 8 bytes ~ 13.4108216 gigabytes – Oleg Jul 19 '13 at 22:30
  • Is it bad that I did that calculation in wolframalpha? :) – voxeloctree Jul 19 '13 at 22:35
  • 1
    It would be better to sit down and think why you want such distances. Is it to find a minimum/maximum? Perhaps calculating them on demand would be wiser... – Gallium Nitride Jul 21 '13 at 16:20

1 Answers1

1

Simply put yes, the pdist method is hungry for your memory and your computer cannot feed it. For example, even with a 6000 by 300 matrix X, I get the following variable sizes for X and Y using whos X Y:

>> whos X Y
  Name         Size                      Bytes  Class     Attributes

  X         6000x300                  14400000  double              
  Y            1x17997000            143976000  double    

Now my my memory states (on a 32 bit machine):

>> memory
        Maximum possible array:             677 MB (7.101e+008 bytes) *

So I am really pushing the memory limits with the computation Y = pdist(X) as this produces an array of roughly 1.44 *10^8 bytes whereas the maximum possible array size is roughly of order 5 times that. Any bigger with the matrix and your system might not be happy. Your matrix of 60000 by 300 will produce a Y array of 179970000 values!

There might be workarounds if you really need to compute the Euclidian distance of a matrix this size, if so, I might be able to help you more...

voxeloctree
  • 839
  • 6
  • 13
  • Hrm... yea I am trying to code spectral clustering for certain types of image segmentation. Part of the code requires this sort of pairwise Euclidean distance. I tried hard coding this function, but matlab takes hours to days to run... Is there a faster workaround? – user2521074 Jul 21 '13 at 04:34
  • I see. Throw this on the overflow as a new question. – voxeloctree Jul 21 '13 at 18:16
  • @voxeloctree: I almost didn't see the difference there, 6e3 (your example) vs. 6e4 (OP's data) :) – Amro Jul 22 '13 at 03:02
  • Right, it was to produce an example that the OP could do themselves to get the grasp of memory strain of pdist. – voxeloctree Jul 22 '13 at 20:23