1

I have a 2D (4950, 4950) dask array which I want to compute in parallel. Using link: https://docs.dask.org/en/latest/delayed-best-practices.html#don-t-call-dask-delayed-on-other-dask-collections

print(da.shape)
partitions = da.to_delayed()
print(partitions)
delayed_values = [dask.delayed(funct)(part) for part in partitions]
print(delayed_values)

Result I am getting is:

(4950, 4950)
[[Delayed(('gt-f3b8d1635832fc9b88447def18b4b7d0', 0, 0))
  Delayed(('gt-f3b8d1635832fc9b88447def18b4b7d0', 0, 1))
  Delayed(('gt-f3b8d1635832fc9b88447def18b4b7d0', 0, 2))
  Delayed(('gt-f3b8d1635832fc9b88447def18b4b7d0', 0, 3))]
 [Delayed(('gt-f3b8d1635832fc9b88447def18b4b7d0', 1, 0))
  Delayed(('gt-f3b8d1635832fc9b88447def18b4b7d0', 1, 1))
  Delayed(('gt-f3b8d1635832fc9b88447def18b4b7d0', 1, 2))
  Delayed(('gt-f3b8d1635832fc9b88447def18b4b7d0', 1, 3))]
 [Delayed(('gt-f3b8d1635832fc9b88447def18b4b7d0', 2, 0))
  Delayed(('gt-f3b8d1635832fc9b88447def18b4b7d0', 2, 1))
  Delayed(('gt-f3b8d1635832fc9b88447def18b4b7d0', 2, 2))
  Delayed(('gt-f3b8d1635832fc9b88447def18b4b7d0', 2, 3))]
 [Delayed(('gt-f3b8d1635832fc9b88447def18b4b7d0', 3, 0))
  Delayed(('gt-f3b8d1635832fc9b88447def18b4b7d0', 3, 1))
  Delayed(('gt-f3b8d1635832fc9b88447def18b4b7d0', 3, 2))
  Delayed(('gt-f3b8d1635832fc9b88447def18b4b7d0', 3, 3))]]
[Delayed('funct-c0044e9f-4b8e-4d02-b364-f6a483eaae2f'), 
 Delayed('funct-d2d14dcd-6f0a-4198-b999-221b0609bcaa'), 
 Delayed('funct-1951008c-14f4-43da-bbc1-443e90aae029'), 
 Delayed('funct-a254e3ba-2d45-45f8-bae4-85ba8c37a32f')]

I want to figure out row index (first and last index) for each partition to save compute result for each index in final output file.

I am unable to find much documentation related to partitions, Any help/link that can help to find row index is highly appreciated.

Manvi
  • 1,136
  • 2
  • 18
  • 41

1 Answers1

0

For Dask arrays you want to look at the .chunks attribute. In particular I think that you will probably want something like

[np.cumsum(c) for c in x.chunks]

For more information, see https://docs.dask.org/en/latest/array-design.html#chunks

MRocklin
  • 55,641
  • 23
  • 163
  • 235