Questions tagged [dask-delayed]

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

290 questions
1
vote
2 answers

Efficient way to compute difference of all rows in dask dataframe

I am looking for efficient way to compute difference of all rows in dask dataframe(preferred) Or any efficient way in pandas df.I have huge file with millions of rows, it is taking like forever to compute this.. Below is example: d = {'A': [5, 9,…
Manvi
  • 1,136
  • 2
  • 18
  • 41
1
vote
1 answer

Dask Distributed - Recommended package versions

I recently tried upgrading my Dask Distributed to 2.3.0 and fastparquet to 0.3.2, but found that it was leading to deadlocks and worker dropouts on code that works with Dask Distributed 2.1.0 (I want to make use of the read_parquet feature…
dan
  • 183
  • 13
1
vote
1 answer

Frequent KilledWorker: pandas_read_text-read-block-from-delayed

I have a standard cluster setup on kubernetes using the dask docker images but not using the dask helm charts. I tried running an existing script on the cluster but doesn’t seem to run. It keeps throwing errors. The cluster details: 1 notebook, 1…
1
vote
1 answer

Why does __bool__ built-in function have to raise exception on dask.delayed objects?

I am trying to run a DAG of tasks using dask API for my specific application. To put it in a contrived example, I want tasks to pass out their success/failure flags and use those as the input to other tasks. However, dask does not let me do __bool__…
1
vote
1 answer

Delayed dask.dataframe.DataFrame.to_hdf computations crashing

I'm using Dask to to execute the following logic: read in a master delayed dd.DataFrame from multiple input files (one pd.DataFrame per file) perform multiple query calls on the master delayed DataFrame use DataFrame.to_hdf to save all dataframes…
ddavis
  • 337
  • 5
  • 15
1
vote
1 answer

Why sometimes do I have to call compute() twice on dask delayed functions?

I'm working with dask delayed functions and I'm getting familiar with the do's and don'ts when using the @dask.delayed decorator on functions. I realized that sometimes I will need to call compute() twice to get the result despite the fact that I…
Yilie Ma
  • 35
  • 5
1
vote
1 answer

How to convert from HighLevelGraph to regular dask dict

I have some code like: x = delayed(add)(3, 2) y = delayed(add)(x, x) and I want to get the resulting dask graph as: dsk = { 'x': (add, 3, 2), 'y': (add, 'x', 'x') } But instead I am getting a HighLevelGraph as follows: {'x': {'x': (add, 3,…
1
vote
1 answer

Load images into a Dask Dataframe

I have a dask dataframe which contains image paths in a column (called img_paths). What I want to do in the next steps is to load images using those image paths into an another column (called img_loaded) and followed by applying some pre-processing…
Sanchit
  • 3,180
  • 8
  • 37
  • 53
1
vote
1 answer

Why dask doesnt execute in parallel

Could someone point out what I did wrong with following dask implementation, since it doesnt seems to use the multi cores. [ Updated with reproducible code] The code that uses dask : bookingID = np.arange(1,10000) book_data =…
hudarsono
  • 389
  • 4
  • 19
1
vote
1 answer

How to interpret suffix numbers in Dask visualisation?

When using dask to visualise a graph, the produced graph has 2 kinds of nodes. The square nodes and the circular nodes. Generally speaking, square nodes seems to be actual values. While the circles are functions producing those values. However these…
CMCDragonkai
  • 6,222
  • 12
  • 56
  • 98
1
vote
1 answer

How to queue dask delayed on each workers to allow sequential execution of a process?

I need worker to process a single tasks at a time and finish the current process before starting a new one. I cannot manage to: (1) have at most one task running at any moment on each worker, (2) make a worker finish a procedure before starting a…
mathdugre
  • 98
  • 7
1
vote
1 answer

How do I run a group of nodes together with Dask

I have an image processing graph and I want to process many images in batch. My graph looks like the following: When I run the graph bokeh shows the execution path like this: This causes my machines to run out of memory and crash as the output of…
Matt Nicolls
  • 173
  • 1
  • 7
1
vote
1 answer

Launch function on cluster with DASK

I am new to DASK and would like to make a test of running DASK on a cluster. The cluster has a head server and several other nodes. I can enter into other nodes by a simple ssh without password, once I log in the head server. I would like to…
1
vote
1 answer

Use already done computation wisely

If I've got a dask dataframe df. Now I apply some computation on it. Mathematically, df1 = f1(df) df2 = f2(df1) df3 = f3(df1) Now if I run, df2.compute(), now after that if I run df1.compute(). How can I stop dask from recomputing the result of…
Dhruv Kumar
  • 399
  • 2
  • 13
1
vote
1 answer

Need clarity in copying a dask.dataframe

Can pandas.DataFrame.copy API can be exactly imitated in dask.DataFrame, using the following code? from copy import copy df2 = copy(df) Is it simple copy or deep copy? How can I do the other type of copy? Or do I necessarily need to do the…
Dhruv Kumar
  • 399
  • 2
  • 13