Questions tagged [dask-delayed]

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

290 questions
1
vote
1 answer

Dask distributed memory and constant pickling/unpickling of results

dask.distributed keeps data in memory on workers until that data is no longer needed. (Thanks @MRocklin!) While this process is efficient in terms of network usage, it will still result in frequent pickling and unpickling of data. I assume this is…
Mark Horvath
  • 1,136
  • 1
  • 9
  • 24
1
vote
1 answer

How can i execute a certain function on each sheet of a xlsx file having more that 100 sheets in a parallel way?

I have an xlsx file File.xlsx, it has more than 100 sheets. I need to perform a certain function f() on each sheet's data and finally return a list of each sheet's operated appended together. I tried using pandas, reading each sheet's data one by…
1
vote
2 answers

Global cache dict across dask workers

Let's say I have a delayed function which does a certain task but it needs a dict to store intermediate key/value pairs which are read and modified in each dask worker. Can delayed or another mechanism be used to share the cache dict across…
Nathan McCoy
  • 3,092
  • 1
  • 24
  • 46
1
vote
1 answer

Dask multiple clients

Is it possible to have multiple clients in dask? For instance, can I have multiple threads running with one client per thread, so that when one thread blocks, the others can continue? In this case, each client would have separate task graphs that…
jrdzha
  • 161
  • 2
  • 12
1
vote
1 answer

Dask Delayed Error - AttributeError: '_thread._local' object has no attribute 'value'

I've been wrapping my brain trying to figure out why I cannot execute this parallizable function on Dask. Essentially I have a function that loads in a keras model (I'm storing the model using mlflow) and then uses the model's predict method on some…
Riley Hun
  • 2,541
  • 5
  • 31
  • 77
1
vote
1 answer

Why dask delayed do nothing?

I am using dask to process files line by line. However, dask seems that do not do anything. My code logic is as follows: import dask from dask import delayed from time import sleep @dask.delayed def inc(x): sleep(1) print(x) def test(): …
DuFei
  • 447
  • 6
  • 20
1
vote
1 answer

Python and Dask - reading and concatenating multiple files

I have some parquet files, all coming from the same domain but with some differences in structure. I need to concatenate all of them. Below some example of these files: file 1: A,B True,False False,False file…
Guido Muscioni
  • 1,203
  • 3
  • 15
  • 37
1
vote
1 answer

Dask distributed library giving serialization error

I have initialized the cluster with 10 workers and 4 thread per worker and I have 12 core laptop where I am running this. cluster = makeIndividualDashboard.LocalCluster(n_workers=10, threads_per_worker=4) client =…
1
vote
0 answers

Dask delayed: pass combination of two lists

I have a feeling this should be easily possible, but I fail to pass combinations of (lazy) lists to a delayed function: def test(a,b): return(str(a)+','+str(b)) a = [1,2] #not lazy for example b = [3,4] #not lazy c = dask.delayed(test)(a,b) c =…
Willem
  • 593
  • 1
  • 8
  • 25
1
vote
1 answer

Long running workers blocking GIL timeout errors

I'm using dask-distributed with a local setup (LocalCluster with 5 workers) on a dask.delayed workload. Most of the work is done by the vtk Python bindings. Since vtk is C++ based I think that means the workers don't release the GIL when in a…
Patrick Mineault
  • 741
  • 5
  • 11
1
vote
2 answers

dask.distributed not utilising the cluster

I'm not able to process this block using the distributed cluster. import pandas as pd from dask import dataframe as dd import dask df = pd.DataFrame({'reid_encod':…
Naren Babu R
  • 453
  • 2
  • 9
  • 33
1
vote
2 answers

Dask - Possible to assign dask_key_name to dask dataframe tasks?

In the course of debugging issues, I've found it hard to decipher exactly which tasks are causing problems. I've used the 'dask_key_name' kwarg successfully in delayed tasks to assign a human-readable name to the key for those delayed tasks (based…
dan
  • 183
  • 13
1
vote
1 answer

Flatten JSON with Dask DataFrames

I'm trying to flatten JSON arrays object (no files .json) in Dask dataframes, because I have a lot of data and my RAM is consumed by the processes are running constantly, so I need a solution in a parallel form. That's the JSON I have: [ { …
1
vote
1 answer

How to find row index for dask array partitions

I have a 2D (4950, 4950) dask array which I want to compute in parallel. Using link: https://docs.dask.org/en/latest/delayed-best-practices.html#don-t-call-dask-delayed-on-other-dask-collections print(da.shape) partitions =…
Manvi
  • 1,136
  • 2
  • 18
  • 41
1
vote
1 answer

Adding labels to a Dask graph

Default graph.visualize() function does not display task labels. Is there any way they can be added manually using graph = dask.delayed()(tasks)? I need the labels to show basic business logic (which tables are joined with which etc.).
Michał Zawadzki
  • 695
  • 6
  • 14