1

I am trying to do dot product of very large 2 dask arrays X (35000 x 7500) and Y(7500 x 10). As the dot product will also be very large I am storing it in hdf5

f = h5py.File('output.hdf5')
f['output'] = X.dot(Y)

But the second command is not giving any output even though its almost 1 hour. What is wrong? Is there faster technique ? Is there issue of "chunks" while creating X and Y?

Kavan
  • 331
  • 1
  • 4
  • 13

1 Answers1

1

Consider the .to_hdf5 method or da.store function.

>>> X.dot(Y).to_hdf5('output.hdf5', 'output')

or

>>> output = f.create_dataset('/output', X.dot(Y).shape, X.dot(Y).dtype)
>>> da.store(X.dot(Y), output)

The to_hdf5 method is probably easier for you. The da.store method is general to other formats as well.

The __setitem__ function in H5Py (what you're using when you say f['output'] = ... is hardcoded to use NumPy arrays.

Here is the appropriate section in the documentation.

MRocklin
  • 55,641
  • 23
  • 163
  • 235
  • Sir I saw your answer [here](http://stackoverflow.com/questions/34434217/why-is-dot-product-in-dask-slower-than-in-numpy) . So how do i know what type of chunking is best for me? – Kavan Mar 25 '16 at 14:30
  • The to_hdf5 method will chunk your dataset similar to how your dask.array is chunked. I recommend just using that. – MRocklin Mar 25 '16 at 14:44
  • My X is of float32 and Y float96. It is showing me "TypeError: Unsupported float size " . Any clues? – Kavan Mar 25 '16 at 15:01
  • I wasn't aware that NumPy supported dtype float96. Perhaps try float64 or 128? – MRocklin Mar 25 '16 at 15:46
  • I changed to float64. It says "ValueError: Chunks do not align" – Kavan Mar 25 '16 at 16:01
  • I'm afraid you'll have to give a clearer explanation of what's happening. Hopefully a small example that reproduces the error that I can try myself. – MRocklin Mar 25 '16 at 16:06
  • .to_hdf5() worked by restarting shell . But now I am facing another problem. I got very large dask array yHat. Now "yHat[0][0].compute()" or "yHat[0].compute()" isn't working. It freezes my shell and keyboard interrupt is also not working. – Kavan Mar 26 '16 at 14:09
  • I recommend starting a new stackoverflow question for your new issue. – MRocklin Mar 26 '16 at 23:12