Questions tagged [dask-delayed]

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

290 questions
2
votes
1 answer

Custom Dask traversable object

I used a custom dictionary like object to easily store the results of a Dask-graph, but using the resulting object to compute the Dask graph, doesn't compute its children. Is it possible to change the custom object, in such a way that it Dask is…
Henk
  • 145
  • 1
  • 6
2
votes
1 answer

Xarray Dask.delayed slow: how to be fast to select/interpolate between two datasets

I have two datasets (called satdata and atmosdata). Atmosdata is evenly gridded on latitude and longitude. Atmosdata has the dimension (latitude: 713, level: 37, longitude: 1440, time: 72), and has a total size of 12GB. Atmosdata has several…
Xiaoni
  • 31
  • 3
2
votes
1 answer

Parallel SQL queries

How does one run SQL queries with different column dimensions in parallel using dask? Below was my attempt: from dask.delayed import delayed from dask.diagnostics import ProgressBar import dask ProgressBar().register() con =…
Nick
  • 101
  • 7
2
votes
1 answer

How to avoid large objects in task graph

I am running simulations using dask.distributed. My model is defined in a delayed function and I stack several realizations. A simplified version of what I do is given in this code snippet: import numpy as np import xarray as xr import dask.array…
astoeriko
  • 730
  • 8
  • 20
2
votes
1 answer

Dask delayed / dask array no response

I have a distributed dask cluster setup and I have used it to load and transform a bunch of data. Works like a charm. I'm want to use it do some processing in parallel. Here's my function el = 5000 n_using = 26 n_across= 6 mat =…
Sid R
  • 33
  • 7
2
votes
0 answers

Bokeh UI not working with DASK on another host

I've ran dask with bokeh on a cluster with 4 machines. Now I've opened the dask ui page, on :8787, We can see the graphs etc. are not there Empty UI But the normal text and simple graphic are there Logs I'm getting this error in the console. Error
2
votes
1 answer

How to avoid set_index on a pre-sorted DataFrame constructed with from_delayed?

I am trying to get the expression, 'df.resample('1T', how='mean').sum()' to work in Dask but, running into an issue where it seems like Dask needs me to explicitly set_index on the DataFrame before performing resample. I get an error as below... >>>…
PhaKuDi
  • 141
  • 8
2
votes
1 answer

How to use Dask Delayed with rpy2?

I'm attempting to use Dask, specifically dask delayed to generate time series forecast in parallel using rpy2 and the forecast package in R. My process works when only using 1 core but I get a NotImplementedError: Conversion 'py2ri' not defined for…
Davis
  • 163
  • 2
  • 10
2
votes
1 answer

Passing Futures as arguments in Dask

What is the best way to pass a Future to a Dask Delayed function such that the Future stays in tact? In other words, how can we ensure the function will get the actual Future and not the result it represents?
jakirkham
  • 685
  • 5
  • 18
2
votes
1 answer

How does dask.delayed handle mutable inputs?

If I have an mutable object, let's say for example a dict, how does dask handle passing that as an input to delayed functions? Specifically if I make updates to the dict between delayed calls? I tried the following example which seems to suggest…
postelrich
  • 3,274
  • 5
  • 38
  • 65
2
votes
1 answer

Add/Enable timestamp to Dask scheduler/workers console logs

Is there a way to add/enable timestamp to Dask scheduler/workers console logs.? dask: 0.15.0-py35_0 distributed: 1.17.1-py35_0 If I use the above versions this is not enabled - Scheduler - distributed.scheduler - INFO -…
B Jacob
  • 389
  • 3
  • 9
2
votes
1 answer

distributed.protocol.pickle - INFO - Failed to serialize. Exception: Pickling an AuthenticationString object is disallowed for security reasons

python code:: from dask.distributed import variable, Client from multiprocessing import Process, current_process def my_task(proc): print("process object::", proc) def doubler(number): # do stuff returns something proc =…
2
votes
2 answers

efficiently create dask.array from a dask.Series of lists

What is the most efficient way to create a dask.array from a dask.Series of list? The series consists of 5 million lists 300 of elements. It is currently divide into 500 partitions. Currently I am trying: pt = [delayed(np.array)(y) for y in …
Daniel Mahler
  • 7,653
  • 5
  • 51
  • 90
2
votes
0 answers

Load dask dataframe from CSV lazily (inside delayed)

While using dask.distributed I'm trying to load dask dataframe from CSV on S3 inside delayed function like this: @delayed def func1(): ... return df.read_csv(*s3_url*, ...) read_csv() does not need interaction with distributed client, so I…
evilkonrex
  • 255
  • 2
  • 10
2
votes
1 answer

Access a single element in large published array with Dask

Is there a faster way to only retrieve a single element in a large published array with Dask without retrieving the entire array? In the example below client.get_dataset('array1')[0] takes roughly the same time as…
sudouser2010
  • 171
  • 1
  • 6