Questions tagged [dask-delayed]

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

290 questions
2
votes
1 answer

How to recursively compute Delayed in collection

I am a new user to try dask delayed. I want to use delayed to automatically transform function and code into Delayed. However, I found delayed.compute didn't recursively compute Delayed in collection... from dask import delayed, base @delayed def…
Colin Yu
  • 23
  • 2
1
vote
0 answers

How to parallelize a ML pipeline with Dask or Multithread?

I have one unique DataFrame which I need to train in the same model (LogisticRegression) multiple times. _list_scores = [] for i in range(df.shape[0]): X_train = df.iloc[0:i+1, :-1] y_train = df.iloc[0:i+1, -1:] model.fit(X_train,…
1
vote
2 answers

Dask/pandas apply function and return multiple rows

I'm trying to return a dataframe from the dask map_partitions function. The example code I provided returns a 2 row dataframe in the function. However only 1 row is shown in the end result. Which is in this case only the column name row. I removed…
Sam
  • 338
  • 1
  • 4
  • 17
1
vote
0 answers

Assign delayed objects of variable shape and position to a Dask array

I would like to assign multiple small Dask arrays into parts of one large Dask array. My problem is similar to the one addressed in this post, expect my small arrays have a variable shape. My problem is also similar to the one addressed in this…
rybchuk
  • 11
  • 1
1
vote
2 answers

Dask Partitions or Delayed in a NLP Stanza preocess

I´m working over a NLP process with Stanza. Stanza takes a long time to run the NLP process and I understand that my problem is quite partitionable. I use these libraries pip install stanza import stanza stanza.download('es') nlp =…
1
vote
1 answer

Dask (delayed) vs pandas/function returns

I am trying to study a little bit about dask as a solution my parallel computing over some big data I have. I have a code where I check a list of transactions and extract the number of active customers on every period (an active customer is a…
FábioRB
  • 335
  • 1
  • 12
1
vote
0 answers

Why is my caching factory method unable to reuse cached objects between Dask Tasks on same Worker?

I'm using Dask distributed to run a processing DAG on a local cluster. As part of that processing, I create DB connection managers to track stuff about processing tasks (poor design choice as it turns out but we're living with it). Early on I…
1
vote
0 answers

Making Dask use references instead of making copies of input?

Lets say I have something like this: def foo(a): return a.sum() x = np.random.rand(1000000,70) X = dask.array.from_array(x) X_list = [dask.delayed(foo)(X) for n in range(600)] Xsums = dask.compute(*X_list) This seem to get hung up in…
TKK
  • 11
  • 2
1
vote
0 answers

dask worker ModuleNotFoundError when import is not in current directory

I have setup the file system as such: \project \something __init__.py some.py (with a function test() defined) run.py And my run.py looks like this: import os import sys import dask from dask.distributed import Client from…
michaelgbj
  • 290
  • 1
  • 10
1
vote
1 answer

Limit memory used by Dask during synchronous computation

I'm trying to use Dask to process a dataset larger than memory, stored in chunks saved as NumPy files. I'm loading the data lazily: array = da.concatenate([ da.from_delayed( dask.delayed(np.load)(path), shape=(size, window_len,…
ondra.cifka
  • 755
  • 1
  • 9
  • 17
1
vote
2 answers

Dask performs recomputation in branched graphs

Suppose, I create the following graph: import dask import time @dask.delayed def step_1(): print("Running Step 1") time.sleep(1) return True @dask.delayed def step_2(prev_step): print("Running Step 2") time.sleep(1) return…
Rehan Rajput
  • 112
  • 8
1
vote
1 answer

Dask Delayed with xarray - compute() result is still delayed

I tried to perform with Dask and xarray some analysis (e.g. avg) over two datasets, then compute a difference between the two results. This is my code cluster = LocalCluster(n_workers=5, threads_per_worker=3, **worker_kwargs) def calc_avg(path): …
Fab
  • 1,145
  • 7
  • 20
  • 40
1
vote
1 answer

Why the order is not respected in a for loop using dask?

Why when I run a for-loop in the code below, dask prefers to firstly do the 'Four' then 'One', and so on instead of starting from the first and finishing with the last element? Is it possible that I get some mixed (wrong) results where for example…
sepehr
  • 23
  • 6
1
vote
2 answers

Dask looping over library function call

Goal I would like to parallelize a loop with dask that uses a library function inside the loop. This function, mhw.detect(), calculates some statistics on a slice of a numpy array. None of the slices of the array depend on the other slices, so I was…
Rachel W
  • 123
  • 1
  • 11
1
vote
1 answer

Using `dask` to fill `boost_histograms` stored in class in parallel

I have an dask -boost_histogram question. I have a code structure as follows: I have a class defined in some script: class MyHist: def __init__(....): self.bh = None def make_hist(...): axis = bh.axis.Regular(....) …