Questions tagged [dask-delayed]

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

290 questions
3
votes
1 answer

Computing multiple dask.dataframe.from_delayed() from one source

How can I compute .from_delayed() in parallel from one sequence of delayed? def foo(): df1, df2 = ... # prepare two pd.DataFrame() in one foo() call return df1, df2 dds = [dask.delayed(foo)() for _ in range(5)] # 5 delayed pairs (df1,…
Ilya
  • 31
  • 1
3
votes
1 answer

Is concat in dask dataframe lazy operation?

I'm reading a list of files using dask read_parquet and concatenate those data frames and writing to some file. during the concatenate, does dask read's all the data in to memory while concatenating or it is loading only schema's, concatenate(I'm…
Learnis
  • 526
  • 5
  • 25
3
votes
1 answer

dask.delayed KeyError with distributed scheduler

I have a function interpolate_to_particles written in c and wrapped with ctypes. I want to use dask.delayed to make a series of calls to this function. The code runs successfully without dask # Interpolate w/o dask result =…
elltrain
  • 82
  • 4
3
votes
1 answer

Can I use dask.delayed on a function wrapped with ctypes?

The goal is to use dask.delayed to parallelize some 'embarrassingly parallel' sections of my code. The code involves calling a python function which wraps a c-function using ctypes. To understand the errors I was getting I wrote a very basic…
elltrain
  • 82
  • 4
3
votes
1 answer

Best way to parallelize computation over dask blocks that do not return np arrays?

I'd like to return a dask dataframe from an overlapping dask array computation, where each block's computation returns a pandas dataframe. The example below shows one way to do this, simplified for demonstration purposes. I've found a combination…
HoosierDaddy
  • 720
  • 6
  • 19
3
votes
3 answers

AttributeError: module 'dask' has no attribute 'delayed'

Using Pycharm Community 2018.1.4 Python 3.6 Dask 2.8.1 Trying to implement dask delayed on some of my methods and getting an error AttributeError: module 'dask' has no attribute 'delayed'. This is obviously not true so I am wondering what I am…
Sherry
  • 353
  • 3
  • 15
3
votes
1 answer

Dask - How to cancel and resubmit stalled tasks?

Frequently, I encounter an issue where Dask randomly stalls on a couple tasks, usually tied to a read of data from a different node on my network (more details about this below). This can happen after several hours of running the script with no…
dan
  • 183
  • 13
3
votes
1 answer

How to use group by describe with unstack operation in python dask?

I am trying to use the describe() and unstack() function in dask to get the summary statistics of the data. However, i get an error as shown below import dask.dataframe as dd df =…
The Great
  • 7,215
  • 7
  • 40
  • 128
3
votes
1 answer

Dask - Quickest way to get row length of each partition in a Dask dataframe

I'd like to get the length of each partition in a number of dataframes. I'm presently getting each partition and then getting the size of the index for each partition. This is very, very slow. Is there a better way? Here's a simplified snippet of…
dan
  • 183
  • 13
3
votes
1 answer

Dask map_blocks - IndexError: tuple index out of range

I want to do the following with Dask: Load a matrix from a HDF5 file Parallelize the calculation of each entry Here is my code: def blocked_func(x): return np.random.random() with h5py.File(file_path) as f: d = f['/data'] arr =…
Andy R
  • 1,339
  • 10
  • 20
3
votes
1 answer

How can I get result of Dask compute on a different machine than the one that submitted it?

I am using Dask behind a Django server and the basic setup I have is summarised here: https://github.com/MoonVision/django-dask-demo/ where the Dask client can be found here:…
Matt Nicolls
  • 173
  • 1
  • 7
3
votes
2 answers

euclidean distance calculation using Python and Dask

I'm attempting to identify elements in the euclidean distance matrix that fall under a certain threshold. I then take the positional arguments for this search and use them to compare elements in a second array (for sake of demonstration this array…
3
votes
1 answer

Dask lazy initialization very slow for list comprehension

I'm trying to see if Dask would be a suitable addition to my project and wrote some very simple test cases to look into it's performance. However, Dask is taking a relatively long time to simply perform the lazy initialization. @delayed def…
ltt
  • 417
  • 3
  • 12
3
votes
1 answer

How should I write multiple CSV files efficiently using dask.dataframe?

Here is the summary of what I'm doing: At first, I do this by normal multiprocessing and pandas package: Step 1. Get the list of files name which I'm gonna to read import os files = os.listdir(DATA_PATH + product) Step 2. loop over the…
TianYu Jiang
  • 31
  • 1
  • 2
3
votes
0 answers

Implementation of a recursive function using dask.delayed

How can I successfully implement Merge Sort using dask.delayed or with some other dask API. So that it becomes faster with parallelism.
Dhruv Kumar
  • 399
  • 2
  • 13
1 2
3
19 20