Questions tagged [dask-delayed]

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

290 questions
1
vote
1 answer

Distribution and Computation of dask.delayed object

Do dask.delayed objects get distributed by dask on a cluster? Also, is the execution of its task graph also distributed on a cluster?
Dhruv Kumar
  • 399
  • 2
  • 13
1
vote
1 answer

Can Dask parallelize contents inside a function?

I have a function written in python. I want to know if a code inside that function is parallelizable, can I somehow parallelize the code inside that function without making dask API calls inside that function? I was thinking of whether…
Dhruv Kumar
  • 399
  • 2
  • 13
1
vote
0 answers

Submit looping calculation to dask and get back the result

My co-worker and I have been setting up, configuring, and testing Dask for a week or so now, and everything is working great (can't speak highly enough about how easy, straightforward, and powerful it is), but now we are trying to leverage it for…
1
vote
1 answer

How to specify the number of partitions reading parquet into a Dask dataframe?

I read my parquet data as follows: file_names = glob.glob('./events/*/*/*/*/*/part*.parquet') pf = fp.ParquetFile(file_names, root='./events') pf.cats = {'customer': pf.cats['customer']} dfs = (delayed(pf.read_row_group_file)(rg, pf.columns,…
j-bennet
  • 310
  • 3
  • 11
1
vote
1 answer

How to manage the dask-worker, if I have lot many long running tasks.!

Suppose I have 4 node dask cluster in which at node-1, dask-scheduler is running, and at the rest of the nodes dask-workers is running. And I am collectively submitting 5 long running tasks. So what is happening in this case is, 3 tasks are in…
TheCodeCache
  • 820
  • 1
  • 7
  • 27
1
vote
1 answer

how to pass client side dependency to the dask-worker node

scriptA.py contents: import shlex, subprocess from dask.distributed import Client def my_task(params): print("params[1]", params[1]) ## prints python scriptB.py arg1 arg2 child = subprocess.Popen(shlex.split(params[1]), shell=False) …
TheCodeCache
  • 820
  • 1
  • 7
  • 27
1
vote
0 answers

distributed.utils - ERROR - Existing exports of data: object cannot be re-sized

I am running a dask-scheduler on one node and my dask-worker is running on another node.. And I submit a task to the dask-scheduler from a third node. it sometimes throws distributed.utils ERROR - Existing exports of data: object cannot be…
TheCodeCache
  • 820
  • 1
  • 7
  • 27
1
vote
0 answers

dask dataframe to parquet fails with memory error

I created dask dataframe from multiple hdfs files and then tried to write back the final dataframe to hdfs(parquet). But it failed with Memory Error message. dask_df= for parquet_hdfs_path in hdfs_files: …
Santosh Kumar
  • 761
  • 5
  • 28
1
vote
1 answer

How can I visualize a subgraph of a dask graph?

Given I have a really large task graph, my_delayed.visualize() either is impossible to generate or is too dense to be visually useful. If I have key for a particular task, can I specify a particular depth or x number of parents and children to…
postelrich
  • 3,274
  • 5
  • 38
  • 65
1
vote
1 answer

How to find inputs of dask.delayed task?

Given a dask.delayed task, I want to get a list of all the inputs (parents) for that task. For example, from dask import delayed @delayed def inc(x): return x + 1 def inc_list(x): return [inc(n) for n in x] task =…
postelrich
  • 3,274
  • 5
  • 38
  • 65
1
vote
1 answer

Using Dask to parallelize HDF read-translate-write

TL;DR: We're having issues parallelizing Pandas code with Dask that reads and writes from the same HDF I'm working on a project that generally requires three steps: reading, translating (or combining data), and writing these data. For context,…
zukah
  • 46
  • 4
1
vote
1 answer

Dask dataframe has no attribute categorize

I am trying to store a Dask dataframe, with a categorical column, to a *.h5 file per this tutorial - 1:23:25 - 1:23:45. Here is my call to a store function: stored = store(ddf,'/home/HdPC/Analyzed.h5', ['Tag']) The function store is: @delayed def…
edesz
  • 11,756
  • 22
  • 75
  • 123
1
vote
1 answer

Training Keras model with Dask Array is very slow

I want to use Dask to read a large dataset and feed with it a Keras model. The data consists of audio files and I am using a custom function to read them. I have tried to apply delayed to this function and I collect all of the files in a dask array,…
jl.da
  • 627
  • 1
  • 11
  • 30
1
vote
0 answers

dask delayed loop with tuples

How can I properly use task delayed for a group-wise quotient calculation over multiple columns? some sample data raw_data = { 'subject_id': ['1', '2', '3', '4', '5'], 'name': ['A', 'B', 'C', 'D', 'E'], 'nationality': ['DE',…
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
0
votes
0 answers

What is the grouping I seen in the Graph tab on Dask's Dashboard

For each DTO my Flask Web Server accepts, I create a new Dask Graph and run the Graph on the DTO. The runtime of the graph is 8 seconds. A single graph looks like this: When I stream DTOs to the web server, 1 each second, my graphs look like this:…
David
  • 59
  • 3