Questions tagged [dask-delayed]

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

290 questions
1
vote
3 answers

Synchronize dask map_partitions with print functions

I have the following code: def func1(df): x = 1 print('Processing func1') return x def func2(df): x = 2 print('Processing func2') return x ddf = from_pandas(df, npartitions=3) print('func1 processing…
ps0604
  • 1,227
  • 23
  • 133
  • 330
1
vote
1 answer

Using dask to return more than one dataframe

I am using read_csv() to read a long list of csv files and return two dataframes. I have managed to speed up this action by using dask. Unfortunately, I have not been able to return multiple variables when using dask. The minimum working example…
Tanjil
  • 198
  • 1
  • 17
1
vote
2 answers

Dask: Visualize dask task graph of nested delayed functions

For the following toy example, I am attempting to parallelize some nested for loops using dask delayed/compute. Is there any way I can visualize the task graph for the following? import time from dask import compute, delayed @delayed def…
khubull
  • 31
  • 1
  • 3
1
vote
0 answers

distribute max X columns using dask

I have very large hdf files each with a dataset X of for example shape (24000000,8000) of dtype Int16. I need to run a function on a subset of each of these columns, say X[50000:-50000,:]. This is way too big for memory, so I need to do something…
1
vote
2 answers

dask delayed functions on pandas groupby objects

I couldn't figure out how to compute delayed objects coming from df.groupy.apply() operation. I really appreciate if someone can help. Here is a sample code I wrote import pandas as pd import dask df =…
Sinem
  • 13
  • 2
1
vote
1 answer

Compute list of dask delayed object

I have gone all similar question and solutions provided, but not getting desired output. I have a list of dask delayed objects. for y in ys: projection = Projection(data, X, y) fi = projection.decode() var.append(fi) where Projection class…
ipj
  • 67
  • 7
1
vote
1 answer

iterate through dask delayed dict

I have many delayed dict returned from a dask delayed function. I would like to aggregate them into a summary_dict like below. items function doesn't work on delayed object. @dask.delayed def get_dict(date): return { 'a': {'date':…
abisko
  • 663
  • 8
  • 21
1
vote
1 answer

Assign a delayed object to a dask array TypeError: Delayed objects of unspecified length have no len()

I have the following setting: a function returning an array and a Dask array. I want to call the function inside a for loop and fill a dask array with the function's return. This should be done in parallel. import dask import numpy as np def…
alpha027
  • 302
  • 2
  • 13
1
vote
1 answer

dask distributed - right way to use list returned from a delayed function

My question may be dumb but I just started learning dask distrubuted. Any help is appreciated. I have code like below: @dask.delayed def do_something(date): return x, y get_item0 = dask.delayed(operator.itemgetter(0)) …
abisko
  • 663
  • 8
  • 21
1
vote
1 answer

Can you use Dask DataFrame as lookup table in dask.delayed?

I have data at a scale where a DataFrame merge is unlikely to be successful -- previous attempts have resulted in excessive data shuffling, out of memory errors on the scheduler, and communication timeouts in the workers, even with indexing,…
1
vote
1 answer

Is it possible limit memory usage by writing to disk?

I cannot understand if what I want to do in Dask is possible... Currently, I have a long list of heavy files. I am using multiprocessing library to process every entry of the list. My function opens and entry, operates on it, saves the result in a…
nick
  • 49
  • 7
1
vote
1 answer

Queueing up workers in Dask

I have the following scenario that I need to solve with Dask scheduler and workers: Dask program has N functions called in a loop (N defined by the user) Each function is started with delayed(func)(args) to run in parallel. When each function…
ps0604
  • 1,227
  • 23
  • 133
  • 330
1
vote
1 answer

Create lazy xarray object from Future

I have a dask.delayed function that takes an xarray.Dataarray as an argument and returns one as well. I'm creating a few of these delayed tasks and pass them to client.compute using dask.distributed. Each call to compute returns a…
Val
  • 6,585
  • 5
  • 22
  • 52
1
vote
1 answer

Dask fold with two data frames

This is a textbook question on how to add two DataFrames using Dask (specifically with fold)... I can't seem to get it to work though, so I wanted to reach out to see what I'm doing wrong. (I'm on Python 3.8.5 with Dask 2021.4.1) The code below…
user5406764
  • 1,627
  • 2
  • 16
  • 23
1
vote
0 answers

Storing objects on workers and executing methods

I have an application where I have a set of objects that do a lot of setting up (this takes up to 30s-1minute per object). Once they have been set-up, I want to pass a parameter vector (small, <50 floats) and return a couple of small arrays back.…
Jose
  • 2,089
  • 2
  • 23
  • 29