7

What are the ways we can have perform item assignment in Dask Arrays? Even a very simple item assignment like: a[0] = 2 does not work.

Alger Remirata
  • 529
  • 1
  • 5
  • 17
  • Correct. This is the first limitation noted in the documentation: http://dask.pydata.org/en/latest/array-overview.html#limitations – MRocklin Dec 02 '16 at 15:35
  • 2
    Ok, if this is the case, is there a way we can update the elements of a dask array producing a new dask array? There is no map function here. – Alger Remirata Dec 02 '16 at 16:01
  • @MRocklin, if the dask arrays are immutable, this means that it will be very hard to create blockwise and columnwise algorithms like matrix factorization. The support of indexing/slicing will also be not appreciated because this will not be used for updating. Do you have alternatives in performing updates to dask arrays so that columnwise/row-wise updates can be used? – Alger Remirata Dec 02 '16 at 16:39
  • The dask schedulers assume that all operations are pure. This will remain so for the moderate future. You can still write distributed matrix algorithms, they just involve copies. For BLAS L3 operations this shouldn't be too bad. See https://github.com/dask/dask/blob/master/dask/array/linalg.py – MRocklin Dec 03 '16 at 13:31
  • > There is no map function here Have you looked at the dask.array API? http://dask.pydata.org/en/latest/array-api.html – MRocklin Dec 03 '16 at 13:32
  • @MRocklin, thanks for the responses! – Alger Remirata Dec 05 '16 at 07:41

2 Answers2

7

Correct. This is the first limitation noted in the documentation.

In general, workflows that involve for loops and direct assignment of individual elements are hard to parallelize. Dask array does not make this attempt.

MRocklin
  • 55,641
  • 23
  • 163
  • 235
5

As of dask version 2021.04.1, this type of assignment is now supported - see the dask assignment docs for details.

It is a fairly complete implementation of indexed assignment, including broadcasting and masked assignments. As you would expect, the assignment can be lazily embedded within a sequence of other operations. See the aforementioned docs for the very few indexed assignment cases that work in numpy but not yet in dask.

dhassell
  • 341
  • 2
  • 5