-1

I have a numpy array with 42000 (rows) * 110000 (dimensions) ,I am trying to create a pairwise distance matrix(42000*42000) with 32GB ram and 8 cores.

I tried pairwise_distances_chunked but it is only giving 3120*42000 distance matrix .Also used pairwise_distances but it is giving out of memory error.

Any suggestions what can be done?

1 Answers1

2

Reading the documentation for pairwise_distances_chunked, it yields a chunk at a time. Based on the way you phrased your question, it seems like you did this:

D_chunk = next(pairwise_distances_chunked(X))

That code (which is the first example from the documentation) only gives you the first chunk.

What you want to do is this:

for chunk in pairwise_distances_chunked(X):
    do_something(chunk)
Kyle Pena
  • 517
  • 2
  • 8
  • I tried the exact same thing also but no success ,getting an error " Unable to allocate array with shape (5343, 100352) and data type float32" – Himanshu Jain Sep 12 '19 at 18:46
  • Because you're running out of memory. pairwise_distances_chunked makes chunks so that they fit into your memory. if you keep more than one chunk in memory, you won't have enough memory. The bottom line is that the pairsewise_distances matrix you are trying to create is too big for memory. Maybe you can save the chunks you want to disk one at a time. – Kyle Pena Sep 12 '19 at 19:36