Highest Voted 'dask-delayed' Questions

3

votes

1 answer

File Not Found Error in Dask program run on cluster

I have 4 machines, M1, M2, M3, and M4. The scheduler, client, worker runs on M1. I've put a csv file in M1. Rest of the machines are workers. When I run the program with read_csv file in dask. It gives me Error, file not found

asked Jun 22 '18 at 11:33

Dhruv Kumar

399
2
13

3

votes

1 answer

Can we create a Dask cluster having multiple CPU machines as well as multiple GPU machines both.?

Can we create a dask-cluster with some CPU and some GPU machines together. If yes, how to control a certain task must run only on CPU machine, or some other type of task should run only on GPU machine, and if not specified, it should pick whichever…

dask dask-distributed dask-delayed

asked Apr 16 '18 at 10:09

TheCodeCache

820
1
7
27

3

votes

1 answer

Using dask delayed with functions returning lists

I am trying to use dask.delayed to build up a task graph. This mostly works quite nicely, but I regularly run into situations like this, where I have a number of delayed objects that have a method returning a list of objects of a length that is not…

python python-multiprocessing dask dask-delayed

asked Dec 11 '17 at 16:14

tt293

500
4
14

3

votes

1 answer

Reading LAZ to Dask dataframe using delayed loading

Action Reading multiple LAZ point cloud files to a Dask DataFrame. Problem Unzipping LAZ (compressed) to LAS (uncompressed) requires a lot of memory. Varying filesizes and multiple processes created by Dask result in MemoryError's. Attempts I tried…

python dask dask-delayed laspy

asked Dec 06 '17 at 10:09

Tom Hemmes

2,000
2
17
23

2

votes

2 answers

Dask dataframe parallel task

I want to create features(additional columns) from a dataframe and I have the following structure for many functions. Following this documentation https://docs.dask.org/en/stable/delayed-best-practices.html I have come up with the code…

dask dask-distributed dask-dataframe dask-delayed

asked May 30 '22 at 22:31

J.Ewa

205
3
14

2

votes

0 answers

How could I make my code work parallelize with dask?

First import some packages: import numpy as np from dask import delayed Suppose I have two NumPy arrays: a1 = np.ones(5000000) a2 = np.ones(8000000) I would like to show the sum and length of the two arrays, and the functions are shown as: def…

python numpy dask-delayed

asked Apr 07 '22 at 07:43

Liang Ce

31
4

2

votes

1 answer

What is the best way to lag a value in a Dask Dataframe?

I have a Dask Dataframe called data which is extremely large and cannot be fit into main memory, and is importantly not sorted. The dataframe is unique on the following key: [strike, expiration, type, time]. What I need to accomplish in Dask is the…

sorting dask dask-distributed dask-delayed dask-dataframe

asked Jun 04 '21 at 23:37

Tim de Silva

314
1
13

2

votes

1 answer

Can Dask automatically create a tree to parallelize a computation and reduce the copies between workers?

I've structured this in two sections, BACKGROUND and QUESTION. The Question is all the way at the bottom. BACKGROUND: Suppose I want to (using Dask distributed) do an embarrassingly parallel computation like summing 16 gigantic dataframes. I know…

python parallel-processing dask dask-distributed dask-delayed

asked May 10 '21 at 16:23

user5406764

1,627
2
16
23

2

votes

1 answer

Get PARTITION_ID in Dask for Data Frame

Is it possible to get the partition_id in dask after splitting pandas DFs For example: import dask.dataframe as dd import pandas as pd df = pd.DataFrame(np.random.randn(10,2), columns=["A","B"]) df_parts = dd.from_pandas(df, npartitions=2) part1 =…

pandas dataframe dask dask-delayed dask-dataframe

asked Apr 12 '21 at 19:57

data_person

4,194
7
40
75

2

votes

1 answer

cluster.adapt() kill workers before moving their memory data to others

I am using Dask with Slurm cluster: cluster = SLURMCluster(cores=64, processes=64, memory="128G", walltime="24:00:00") #export DASK_DISTRIBUTED__SCHEDULER__ALLOWED_FAILURES=100 cluster.adapt(minimum_jobs=1, maximum_jobs=2, interval="20 s",…

dask dask-distributed dask-delayed

asked Apr 02 '21 at 09:19

Yuheng Yang

31
4

2

votes

1 answer

Dask hanging when called from command prompt

I have a program that is running as expected when run in a Jupyter Notebook cell, but is failing/hanging when put into a python file and called from either a Jupyter Notebook or from the command line. Here is the test code: import pandas as pd …

python pandas dask dask-distributed dask-delayed

asked Mar 19 '21 at 15:32

mpLoNsTa

23
6

2

votes

1 answer

How to load my train.tfrecord files in saturn cloud for running via Dask?

I am working on Object Detection and I have two record files. Train.tfrecord(1.6GB) and Test.tfrecord(65MB) file. How do I load the training file in Saturn cloud, as I want to speed up the training time using Dask in Saturn Cloud?

object-detection dask tfrecord dask-delayed

asked Mar 17 '21 at 07:23

uNIKx

123
2
13

2

votes

1 answer

KilledWorker error in dask when doing embarrassingly parallel data concatenation

I have an embarrassingly parallel workload where I am reading a group of parquet files, concatenating them into bigger parquet files, and then writing it back to the disk. I am running this in a distributed computer (with distributed filesystem)…

memory-leaks dask distributed-computing dask-distributed dask-delayed

asked Mar 08 '21 at 19:49

rajendra

472
3
18

2

votes

1 answer

Why dask.delayed takes longer than serial code when working with networkx?

I would like to speed up the execution of a function my_func() using parallel computation with dask.delayed. In a loop over 3 dimensions, my_func() extracts a value from an iris.cube.Cube (which is essentially a dask.array loaded from a file outside…

python networkx dask dask-distributed dask-delayed

asked Mar 04 '21 at 15:05

Maria Zamyatina

23
6

2

votes

1 answer

Dask high memory usage when computing two values with common dependency

I am using Dask on a single machine (LocalCluster with 4 processes, 16 threads, 68.56GB memory) and am running into worker memory problems when trying to compute two results at once which share a dependency. In the example shown below, computing…

python dask dask-distributed dask-delayed

asked Feb 27 '21 at 06:59

user73445

23
3

Questions tagged [dask-delayed]