Highest Voted 'dask-distributed' Questions

3

votes

1 answer

Is there a dask api to get current number of tasks in dask cluster

I have come across an issue where dask scheduler get killed(though workers keep running) with memory error if large number of tasks are submitted in short period of time. If it's possible to get current number of task on the cluster, then it's easy…

dask dask-distributed

asked Sep 16 '17 at 22:01

Santosh Kumar

761
5
28

3

votes

0 answers

How to enable proper work stealing in dask.distributed when using task restrictions / worker resources?

Context I'm using dask.distributed to parallelise computations across machines. I therefore have dask-workers running on the different machines which connect to a dask-scheduler, to which I can then submit my custom graphs to together with the…

dask dask-distributed

asked Sep 12 '17 at 15:54

malbert

308
1
7

3

votes

1 answer

using dask distributed computing via jupyter notebook

I am seeing strange behavior from dask when using it from jupyter notebook. So I am initiating a local client and giving it a list of jobs to do. My real code is a bit complex so I am putting a simple example for you here: from dask.distributed…

python-3.x dask dask-distributed

asked Sep 11 '17 at 19:27

Samaneh Navabpour

71
2

3

votes

1 answer

Storing dask collection to files/CSV asynchronously

I'm implementing various kinds of data processing pipelines using dask.distributed. Usually the original data is read from S3 and in the end processed (large) collection would be written to CSV on S3 as well. I can run the processing asynchonously…

dask dask-distributed

asked Aug 24 '17 at 08:41

evilkonrex

255
2
10

2

votes

1 answer

AttributeError: module 'pandas.core.strings' has no attribute 'StringMethods' when importing Dask

I am getting the error stated in the question title when trying to import dask.dataframe interface, even though import dask works. My current version of dask is 2022.7.0. What might be the problem?

pandas dask dask-distributed dask-dataframe

asked May 24 '23 at 11:09

Bex T.

1,062
1
12
28

2

votes

1 answer

Dask tasks distributions with synthetic test

I am trying to use Dask to distribute calculations over multiple systems. However, there is some concept I fail to understand because I cannot reproduce a logical behavior with a simple test that I was using for python mutliprocessing. I am using…

python dask dask-distributed

asked Mar 24 '23 at 14:59

oxedions

61
3

2

votes

1 answer

Why does dask DataFrame.to_parquet try to infer the data schema when storing the file to disk?

I'm having some trouble making sense of Dask's to_parquet method and why it has a schema argument. When I have a Dask DataFrame variable named ddf and access ddf.dtypes, I can see the Data Types of each column, meaning that Dask does know the dtype…

dask parquet dask-distributed

asked Nov 03 '22 at 15:29

Vinicius Silva

518
1
5
13

2

votes

1 answer

How to store data from dask.distributed on disk?

I'm trying to scale my computations from local Dask Arrays to Dask Distributed. Unfortunately, I am new to distributed computed, so I could not adapt the answer here for my purpose. Mainly my problem is saving data from distributed computations back…

python dask dask-distributed zarr

asked Sep 13 '22 at 11:29

Helmut

311
1
9

2

votes

1 answer

How to get the worker name in dask cluster?

I am able to find out the worker address. But I want to know the name. Is there any method to find out? from dask.distributed import Client, LocalCluster cluster = LocalCluster(name='AAA',n_workers=1,threads_per_worker=2) client =…

client dask dask-distributed

asked Jul 12 '22 at 10:42

Professor

87
6

2

votes

1 answer

Difference between dask node and compute node for slurm configuration

First off, apologies if I use confusing or incorrect terminology, I am still learning. I am trying to set up configuration for a Slurm-enabled adaptive cluster. Documentation of the supercomputer and it’s Slurm configuration is documented here. Here…

slurm dask-distributed dask-jobqueue

asked Jun 21 '22 at 21:21

pgierz

674
3
7
14

2

votes

2 answers

Dask dataframe parallel task

I want to create features(additional columns) from a dataframe and I have the following structure for many functions. Following this documentation https://docs.dask.org/en/stable/delayed-best-practices.html I have come up with the code…

dask dask-distributed dask-dataframe dask-delayed

asked May 30 '22 at 22:31

J.Ewa

205
3
14

2

votes

1 answer

How to parallelize concat in Dask?

I am learning to use Dask for parallel data processing for my university project. I connected two nodes to process data using Dask. My data frame involves customer ID, dates, and transactions. The file has about 40GB. I used dask.dataframe to read…

python dask dask-distributed

asked May 27 '22 at 11:30

cs201503

21
2

2

votes

1 answer

How do I submit a class to a Dask-Cluster?

I might misunderstand how Dasks submit() function is working. If I'm submitting a function of my class that is initializing a parameter it is not working. Question: What is the correct way to submit a class to a dask-cluster using .submit()? So, I…

python jupyter-notebook dask dask-distributed dask-ml

asked May 20 '22 at 08:59

Christine

53
8

2

votes

1 answer

Dask Dataframe shape attribute is giving wrong shape

I'm trying to find the shape of a subset dataframe of a larger dask dataframe. But Instead of getting the right shape (# of rows), I'm getting a wrong value In the example, I stored the first 3 rows into a new dataframe, when I'm trying to find the…

python dataframe dask dask-distributed dask-dataframe

asked Mar 23 '22 at 18:57

jhanv

59
6

2

votes

1 answer

map_partitions runs twice when storing dask dataframe in parquet and records are counted

I have a dask process that runs a function on each dataframe partition. I let to_parquet do the compute() that runs the functions. But I also need to know the number of records in the parquet table. For that, I use ddf.map_partitions(len). Problem…

python dask parquet dask-distributed dask-dataframe

asked Mar 16 '22 at 21:56

ps0604

1,227
23
133
330

Questions tagged [dask-distributed]