Questions tagged [dask-delayed]

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

290 questions
0
votes
0 answers

Persisting in memory dask delayed without starting the computation yet

I have multiple computation trees in my python toolkit, but not all are requiered for the current analysis: a1 = build_a1().persist() a2 = build_a2(a1).persist() a3 = build_a3(a2) b1 = build_b1().persist() b2 = build_b2(b1).persist() b3 =…
epizut
  • 3
  • 3
0
votes
1 answer

How to connect to oralce database and import the data into csv format using dask?

How can I connect to oracle database using dask and fetch the data from it and create a csv file using the fetched data.
0
votes
1 answer

How can I use dask_ml preprocessing in a dask distributed cluster

How can I do dask_ml preprocessing in a dask distributed cluster? My dataset is about 200GB and Every time I categorize the dataset preparing for OneHotEncoding, it looks like dask is ignoring the client and try to load the dataset in the local…
0
votes
1 answer

Datetime index-based slicing with Dask

I have two dataframes: links has two datetime columns called onset and offset and each row is an event. The other dataframe is called sensors, is indexed with datetime index of freq 1m, and has ~600 columns, each for a sensor-id. Essentially, for…
estraven
  • 1
  • 1
0
votes
0 answers

Dask Cluster not processing any data and just sitting idle after a while, which was working perfectly fine couple of weeks before

So I'm trying to parallelize the process using the dask cluster. Here's my try. Getting clusters ready: gateway = Gateway( address="http://traefik-pangeo-dask-gateway/services/dask-gateway", …
0
votes
1 answer

BUG: Dask K-means Exception heppen Too many indices for array

I am using K-means clustering on a dataset with shape (563, 207383) via Dask-K-means (CPU based), and am getting the following error: "Dask K-means Exception heppen Too many indices for array" But when I use RapidsAI dask_k-means (GPU Based) it…
Vivek kala
  • 23
  • 3
0
votes
1 answer

Nested dask delayed or futures

Looking for best practice for nested parallel jobs. I couldn't nest dask delayed or futures so I mixed both to get it to work. Is this not recommended? Is there better way to do this? Example: import dask from dask.distributed import Client import…
J.Sung
  • 27
  • 5
0
votes
1 answer

Creating dask dataframe from delayed dask arrays

I've got a list of delayed dask arrays stored in dask_arr_ls that I want to turn into a dask dataframe. Here's a skeleton of my pipeline: def simulate_device_data(num_id): # create data for unknown number of timestamps data_ls =…
0
votes
1 answer

Using DASK to read files and write to NEO4J in PYTHON

I am having trouble parallelizing code that reads some files and writes to neo4j. I am using dask to parallelize the process_language_files function (3rd cell from the bottom). I try to explain the code below, listing out the functions (First 3…
0
votes
1 answer

Display progress on dask.compute(*something) call

I have the following structure on my code using Dask: @dask.delayed def calculate(data): services = data.service_id prices = data.price return [services, prices] output = [] for qid in notebook.tqdm(ids): r =…
0
votes
1 answer

How to add/append a row to a particular partition in the dask dataframe?

I want to append a row to a particular partition in dask dataframes. I have tried out many methods but none of them are possible. Can anyone help me on this. Thanks in advance I tried - first_partition = df.partitions[0] new_dd =…
0
votes
1 answer

Reading large volume data from Teradata using Dask cluster/Teradatasql and sqlalchemy

I need to read large volume data(app. 800M records) from teradata, my code is working fine for a million record. for larger sets its taking time to build metadata. Could someone please suggest how to make it faster. Below is the code snippet which I…
0
votes
1 answer

dask broadcast variable fails with key error when calculating subset of pandas dataframe

I have a pandas data frame and want to apply a costly operation to each group. Therefore, I want to parallelize this task using dask. The initial data frame should be broadcasted. But the computation only fails with:
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
0
votes
1 answer

dask handle delayed failures

How can I port the following function to dask in order to parallelize it? from time import sleep from dask.distributed import Client from dask import delayed client = Client(n_workers=4) from tqdm import tqdm tqdm.pandas() # linear things =…
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
0
votes
1 answer

How many dask jobs per worker

If I spin up a dask cluster with N workers and then submit more than N jobs using cluster.compute, does dask try to run all the jobs simultaneously (by scheduling more than 1 job on each worker) or are the jobs queued and run sequentially ? My…
firdaus
  • 541
  • 1
  • 6
  • 13