2

Consider the following usage:

In [49]: class MyClass(dict):
    ...:     def __init__(self,a):
    ...:         self.a = a
    ...:     def get(self):
    ...:         return a
    ...:     

In [50]: a = MyClass(10)

In [51]: @delayed(pure=True)
    ...: def myFunc(a):
    ...:     return a
    ...: 

In [52]: myFunc(a)
Out[52]: Delayed('myFunc-96ed02ea2192c363a45cec74ef5eaefb')

In [53]: myFunc(a)
Out[53]: Delayed('myFunc-96ed02ea2192c363a45cec74ef5eaefb')

In [54]: a = MyClass(10)

In [55]: myFunc(a)
Out[55]: Delayed('myFunc-96ed02ea2192c363a45cec74ef5eaefb')

In [56]: a.a = 1000

In [57]: myFunc(a)
Out[57]: Delayed('myFunc-96ed02ea2192c363a45cec74ef5eaefb')

In [58]: a['foo'] = 'bar'

In [59]: myFunc(a)
Out[59]: Delayed('myFunc-bf4162396d43f090e476de70d30de251')

My intention here is to tell dask which parameters to use when calculating function purity for caching purposes. This is useful in cases, for example, if the object has subroutines which has some data retrieval methods and these methods in turn depend on internal parameters (plotting parameters, for instance). If I pass this data object through dask, I would obviously not want the key of the delayed instance to change upon changing these paramters. However, I do want the data itself to be saved (basically, self[key] = val here, which is what is happening).

This seems to do the trick.

I want to ask, will this behaviour be supported? Or is there a better way? Or this not compatible with dask's vision? thanks!

julienl
  • 161
  • 12

1 Answers1

2

If you have a special naming scheme for your tasks then one option is to supply a name explicitly with the dask_key_name= keyword option.

In [1]: import dask

In [2]: @dask.delayed(pure=True)
   ...: def f(x, y=10):
   ...:     return x + y
   ...: 

In [3]: f(1, y=10)
Out[3]: Delayed('f-3361ad78bd5bb95a5f748567a245a09e')

In [4]: f(1, y=11)
Out[4]: Delayed('f-4bf1967f6713377c1c0fab72b60ebfd3')

In [5]: f(1, y=10, dask_key_name='f-1')
Out[5]: Delayed('f-1')

In [6]: f(1, y=11, dask_key_name='f-1')
Out[6]: Delayed('f-1')

You could probably use this along with dask's tokenization function dask.base.tokenize to build your own dask.delayed variant that only tokenized the inputs that you cared about.

MRocklin
  • 55,641
  • 23
  • 163
  • 235
  • Oh yes, that's perfect thanks! I looked for tokenize in the documentation after you suggested it, seems there's something related here: http://dask.pydata.org/en/latest/examples/array-extend.html?highlight=tokenize – julienl Mar 22 '17 at 21:37
  • dask.base.tokenize is internal API and so not well documented. You give it things, it produces a token `tokenize(1, 'hello', np.array(5), x=10) -> 'sf7g7gf98g7g8s7g'` – MRocklin Mar 23 '17 at 11:08
  • Is there any way to have function members of a delayed class be deterministic? I can't figure that out and would like to avoid having to add a `dask_key_name` argument and compute a token every time when I call a member function. Basically it would be nice to define a delayed class, say `myclass`, where any call is assumed pure, `myclass.func()`. Decorating the function with a pure delayed object is not enough. Does this exist and have I missed it? It would make life much easier for me thanks! – julienl Apr 04 '17 at 19:43