Questions tagged [dask-delayed]

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

290 questions
0
votes
0 answers

Reshape multiple files in DASK with many different columns

I want to reshape multiple files in DASK with many different columns. ↓Code def convert_1d_to_2d(l, cols): return [l[i:i + cols] for i in range(0, len(l), cols)] def read_excel(inputs, **kwargs): return from_map(pd.read_excel, inputs,…
0
votes
0 answers

Worker killed during running Dask on reading file SFTP

I am trying to read a csv from a remote server using SFTP in Dask. Below is my code snippet import time from dask.distributed import Client, Future import dask.dataframe as dd import pandas as pd import asyncio import dask password =…
0
votes
0 answers

Dask Delayed not actually parallel or faster than serial

Dask.delayed isn't parallelizing- or at least it's not faster than serial. Using their example (found at https://docs.dask.org/en/stable/delayed.html) except replacing "data" with a longer list, the process from start to finish takes over 40 minutes…
0
votes
1 answer

Logs are shown immediately after submitting a job via client on using dask

The logs of the function submitted via the client are immediately displayed. Instead, the logs are expected to be displayed on client.gather(futures). The expected behavior could be achieved using Delayed but not using Futures. Here is the code to…
Roxy
  • 1,015
  • 7
  • 20
0
votes
0 answers

How to use dask dataframe instead of pandas to make a faster calculation

demo csv file: label1 label2 m1 0 KeyT1_L1_1_animebook0000_1 KeyT1_L1_1_animebook0000_1 0.000000 1 KeyT1_L1_1_animebook0000_1 KeyT1_L1_1_animebook0001_1 1.000000 2 KeyT1_L1_1_animebook0000_1 …
neo
  • 55
  • 8
0
votes
1 answer

'DataFrame' object has no attribute 'to_delayed'?

I am using randomforest model from scikit learn and BlockwisevottingRegressor from dask. Code: Error:
0
votes
1 answer

How to use dask for function with no return ? (image processing)

I have a function crop_images_circle(file_dir,kmeans_dir,folders_dir,filename). that does not return any thing. Trying to use dask to parallalise the computation. Implementation without dask for some 100 odd files: for filename in…
Sushant
  • 160
  • 2
  • 10
0
votes
1 answer

how to create writable shared array in dask

I'm new to Dask what i'm trying to find is "shared array between processes and it needed to be writable by any proccess" could someone can show me that? Top a way to implement shared writable array in dask
0
votes
1 answer

Fast method to match geospatial datasets in Python

I have a set of 2000 geospatial points (lon/lat), which I need to match with several other geospatial datasets (I am using Geopandas GeoDataFrames). I am using the sklearn BallTree function to find the neighbors within a certain radius of each point…
0
votes
0 answers

scipy interpolation method using Dask dataframe

I have read bunch of dask examples from either someone's GitHub code or the dask issues. But still have a problem of using Scipy interpolation with Dask parallel computing and hoping someone here can help me to solve it. I actually have issue in how…
Franke Hsu
  • 190
  • 1
  • 2
  • 15
0
votes
1 answer

Dask @delayed converts dataframes to pandas

I have this code that calls a dask @delayed function that takes N dask dataframes as input and returns a dask dataframe as output. There are two problems (1) inside the function the type of the dataframe is pandas instead of dask, and (2) when I get…
ps0604
  • 1,227
  • 23
  • 133
  • 330
0
votes
1 answer

Parallel computing for loop with no last function

I'm trying to parallelize the reading the content of 16 gzip files with script: import gzip import glob from dask import delayed from dask.distributed import Client, LocalCluster @delayed def get_gzip_delayed(gzip_file): with…
Oliver
  • 281
  • 3
  • 14
0
votes
0 answers

Parallelizing a for loop with PyTorch Tensor operations

I am loading my training images into a PyTorch dataloader, and I need to calculate the input image's stats. The calculation is taken directly from https://kozodoi.me/python/deep%20learning/pytorch/tutorial/2021/03/08/image-mean-std.html. T =…
0
votes
1 answer

Dask run all combination of elements in different lists in parallel

I'm trying to run a function on different combination of all the elements in different arrays with dask, and I'm struggling to apply it. The serial code is as below: for i in range(5): for j in range(5): for k in range(5): …
0
votes
0 answers

How to parallel a function taking two arguments and return a dictionary in DASK

I have a function batch_opt taking two arguments (integer i and pandas dataframe train) and return a python dictionary. When I was trying to parallelize the computation using DASK in Python, I got the type error of Delayed objects are immutable. I…
Undecided
  • 611
  • 8
  • 13