Questions tagged [dask-delayed]

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

290 questions
4
votes
1 answer

dask how to define a custom (time fold) function that operates in parallel and returns a dataframe with a different shape

I am trying to implement a time fold function to be 'map'ed to various partitions of a dask dataframe which in turn changes the shape of the dataframe in question (or alternatively produces a new dataframe with the altered shape). This is how far I…
PhaKuDi
  • 141
  • 8
4
votes
1 answer

Creating a dask bag from a generator

I would like to create a dask.Bag (or dask.Array) from a list of generators. The gotcha is that the generators (when evaluated) are too large for memory. delayed_array = [delayed(generator) for generator in list_of_generators] my_bag =…
danodonovan
  • 19,636
  • 10
  • 70
  • 78
4
votes
1 answer

Using dask delayed to create dictionary values

I'm struggling to figure out how to get dask delayed to work on a particular workflow that involves creating a dictionary. The idea here is that func1, func2, func3 can run independently of each other at the same time, and I want the results of…
blahblahblah
  • 2,299
  • 8
  • 45
  • 60
4
votes
2 answers

How do I capture dask-worker console logs in a file?

In the below, I want to capture "dask_client_log_msg" and other task-logs in one file and "dask_worker_log_msg" and other client-logs in a separate file. As obviously client will run in a separate process altogether than the worker. So I need one…
TheCodeCache
  • 820
  • 1
  • 7
  • 27
4
votes
2 answers

How to explicitly stop a running/live task through dask.?

I have a simple task which is scheduled by dask-scheduler and is running on a worker node. My requirement is, I want to have the control to stop the task on demand as and when the user wants..
TheCodeCache
  • 820
  • 1
  • 7
  • 27
4
votes
1 answer

Multiple images mean dask.delayed vs. dask.array

Background I have a list with the paths of thousand image stacks (3D numpy arrays) preprocessed and saved as .npy binaries. Case Study I would like to calculate the mean of all the images and in order to speed the analysis I thought to parallelise…
s1mc0d3
  • 523
  • 2
  • 15
4
votes
1 answer

Dask graph execution and memory usage

I am constructing a very large DAG in dask to submit to the distributed scheduler, where nodes operate on dataframes which themselves can be quite large. One pattern is that I have about 50-60 functions that load data and construct pandas dataframes…
Adam Klein
  • 476
  • 1
  • 4
  • 13
3
votes
2 answers

Applying a function to each timestep in an xarray.Dataset, and return lazy Dask array outputs

I have an xarray.Dataset with two 1D variables sun_azimuth and sun_elevation with multiple timesteps along the time dimension: import xarray as xr import numpy as np ds = xr.Dataset( data_vars={ "sun_azimuth": ("time", [10, 20, 30, 40,…
3
votes
1 answer

limit number of CPUs used by dask compute

Below code uses appx 1 sec to execute on an 8-CPU system. How to manually configure number of CPUs used by dask.compute eg to 4 CPUs so the below code will use appx 2 sec to execute even on an 8-CPU system? import dask from time import sleep def…
Russell Burdt
  • 2,391
  • 2
  • 19
  • 30
3
votes
1 answer

Parallelizing list filtering

I have a list of items that I need to filter based on some conditions. I'm wondering whether Dask could do this filtering in parallel, as the list is very long (a few dozen million records). Basically, what I need to do is this: items = [ …
Victor
  • 1,163
  • 4
  • 25
  • 45
3
votes
1 answer

Dask: Continue with others task if one fails

I have a simple (but large) task Graph in Dask. This is a code example results = [] for params in SomeIterable: a = dask.delayed(my_function)(**params) b = dask.delayed(my_other_function)(a) …
Andrex
  • 602
  • 1
  • 7
  • 22
3
votes
1 answer

How can I systematically reuse the results of delayed functions in Dask?

I am working on building a computation graph with Dask. Some of the intermediate values will be used multiple times, but I would like those calculations to only run once. I must be making a trivial mistake, because that's not what happens. Here is a…
poldpold
  • 53
  • 6
3
votes
0 answers

Dask distributed.core - ERROR - 'tuple' object does not support item assignment

I am using Dask and cython in my project, where I am invoking cython code after register with the client and collect the obtained result from cython code to my dask-python code. When I make a cluster with processes=True, It works fine. But, as soon…
3
votes
1 answer

Dask: How to return a tuple of futures in client.submit

I need to return a tuple from a task which has to be unpacked in the main process because each element of the tuple will go to different dask tasks. I would like to avoid unnecessary communication so I think that the tuple elements should be…
z4m0
  • 33
  • 4
3
votes
0 answers

Loading feather files from s3 with dask delayed

I have an s3 folder with multiple .feather files, I would like to load these into dask using python as described here: Load many feather files in a folder into dask. I have tried two ways both give me different errors: import pandas as pd import…
Dean
  • 105
  • 1
  • 6
1
2
3
19 20