Questions tagged [dask-delayed]

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

290 questions
0
votes
1 answer

Dask scheduler empty / graph not showing

I have a setup as follows: # etl.py from dask.distributed import Client import dask from tasks import task1, task2, task3 def runall(**kwargs): print("done") def etl(): client = Client() tasks = {} tasks['task1'] =…
ic_fl2
  • 831
  • 9
  • 29
0
votes
1 answer

Sharing a Dask cluster between projects with different module versions?

I have a situation where multiple different Flask apps are infrequently used for real-time statistic computations. In this case I need to have good performance when somebody is browsing one of the apps, and at the moment I have a nice and expensive…
Federico Bonelli
  • 761
  • 1
  • 7
  • 23
0
votes
1 answer

DASK Delayed and modules with multiple return variables

I want to use delayed function from DASK. Unfortunately, Using delayed function on modules with multiple return values is not clear to me. For example, if I run following snippet, there is no way to point out to the first return value of inc module…
Jimbo
  • 1
  • 2
0
votes
1 answer

Select dimensions by name from a dask chunk

I have some ensemble files in grib format that I would like to lazy load in Python using dask and xarray. Based in https://climate-cms.org/2018/09/14/dask-era-interim.html, I managed to lazy load the files as intended, but now I want to slice and…
OOM
  • 1
  • 2
0
votes
1 answer

How to improve efficiency on parallel loops in Python

I'm intrigued on how less efficient are parallel loops in Python compared to parloop from Matlab. Here I am presenting a simple root-finding problem brute-forcing initial 10^6 initial guesses between a and b. import numpy as np from scipy.optimize…
0
votes
1 answer

How to use dask.delayed correctly

I did a timing experiment and I don't believe I'm using dask.delayed correctly. Here is the code: import pandas as pd import dask import time def my_operation(row_str: str): text_to_add = 'Five Michigan State University students—Ash Williams,…
aclifton
  • 3
  • 1
0
votes
0 answers

Best practice for loading large dataset and using dask.delayed

I have a csv file of 550,000 rows of text. I read it into a pandas dataframe, loop over it, and perform some operation on it. Here is some sample code: import pandas as pd def my_operation(row_str): #perform operation on row_str to create…
aclifton
  • 3
  • 1
0
votes
1 answer

String Data transformation Processing 300MB having 30M records in Dask Distributed

Starting Dask Scheduler on Node1 (4CPU, 8GB): Dask Scheduler: dask-scheduler --host 0.0.0.0 --port 8786 Starting Workers on Node2(8CPU, 32GB) and Node3 (8CPU, 32GB): Dask Worker: dask-worker tcp://http://xxx.xxx.xxx.xxx:8786 --nanny-port 3000:3004…
0
votes
0 answers

use ast.literal_eval with dask Series

I have a string with format "[[Integer1, tag1], [Integer2, tag2]]" as values in dask Series and want to use df[col] = df[col].apply(ast.literal_eval) to convert these into normal list values within that dask Series. The len of this list value can…
goku
  • 156
  • 2
  • 8
0
votes
0 answers

Dask Delayed function progressively getting slower on each call. Not a memory issue

I have been working on a python script using dask to speed up the processing time. At a high level, the script calls a dask delayed function a number of times to perform new computations. Each time the dask delayed function is called, it has no…
0
votes
1 answer

Different resource allocations for each of the task in a list

I have a list of independent tasks and each needs different resources and takes different calculation times. I have to specify resource constraints on each of these tasks in the list and set the priority for the task with the least amount of…
ranjith
  • 115
  • 2
  • 10
0
votes
1 answer

Using Dask to download, process, and save to csv

Problem Part of my workflow involves downloading hundreds of thousands of files, parse the data, and then save to csv locally. I'm trying to set this workflow up with Dask but it does not appear to be processing in parallel. The Dask dashboard shows…
Vedda
  • 7,066
  • 6
  • 42
  • 77
0
votes
0 answers

Using Dask to convert a double List comprehension

hope all is well! I have hopefully a simple question.. I am trying to convert a double list comprehension line in my code to something that dask can handle (so it can speed up). The code looks like the the following : np.array([np.array([x[1] for x…
0
votes
1 answer

Debugging very slow `from_delayed` call

I have a long-ish dask chained pipeline, and one of the last bits is a string of dask.dataframe.from_delayed calls like below. That line is extremely slow - many minutes per call. It take 1-2 hours to just setup the pipeline. When I debug the…
HoosierDaddy
  • 720
  • 6
  • 19
0
votes
1 answer

Tasks management and monitoring within python script via dask

I have a project folder with many sub-folders ( say 100). The python script navigates to each of these sub folders, calls an executable, writes the results to an out file and moves on to the next subfolder. Here is my python script from…
ranjith
  • 115
  • 2
  • 10