Dask.distributed is a lightweight library for distributed computing in Python. It extends both the concurrent.futures and dask APIs to moderate sized clusters.
Questions tagged [dask-distributed]
1090 questions
3
votes
1 answer
Is there a dask api to get current number of tasks in dask cluster
I have come across an issue where dask scheduler get killed(though workers keep running) with memory error if large number of tasks are submitted in short period of time.
If it's possible to get current number of task on the cluster, then it's easy…

Santosh Kumar
- 761
- 5
- 28
3
votes
0 answers
How to enable proper work stealing in dask.distributed when using task restrictions / worker resources?
Context
I'm using dask.distributed to parallelise computations across machines. I therefore have dask-workers running on the different machines which connect to a dask-scheduler, to which I can then submit my custom graphs to together with the…

malbert
- 308
- 1
- 7
3
votes
1 answer
using dask distributed computing via jupyter notebook
I am seeing strange behavior from dask when using it from jupyter notebook. So I am initiating a local client and giving it a list of jobs to do. My real code is a bit complex so I am putting a simple example for you here:
from dask.distributed…

Samaneh Navabpour
- 71
- 2
3
votes
1 answer
Storing dask collection to files/CSV asynchronously
I'm implementing various kinds of data processing pipelines using dask.distributed. Usually the original data is read from S3 and in the end processed (large) collection would be written to CSV on S3 as well.
I can run the processing asynchonously…

evilkonrex
- 255
- 2
- 10
2
votes
1 answer
AttributeError: module 'pandas.core.strings' has no attribute 'StringMethods' when importing Dask
I am getting the error stated in the question title when trying to import dask.dataframe interface, even though import dask works.
My current version of dask is 2022.7.0. What might be the problem?

Bex T.
- 1,062
- 1
- 12
- 28
2
votes
1 answer
Dask tasks distributions with synthetic test
I am trying to use Dask to distribute calculations over multiple systems.
However, there is some concept I fail to understand because I cannot reproduce a logical behavior with a simple test that I was using for python mutliprocessing.
I am using…

oxedions
- 61
- 3
2
votes
1 answer
Why does dask DataFrame.to_parquet try to infer the data schema when storing the file to disk?
I'm having some trouble making sense of Dask's to_parquet method and why it has a schema argument. When I have a Dask DataFrame variable named ddf and access ddf.dtypes, I can see the Data Types of each column, meaning that Dask does know the dtype…

Vinicius Silva
- 518
- 1
- 5
- 13
2
votes
1 answer
How to store data from dask.distributed on disk?
I'm trying to scale my computations from local Dask Arrays to Dask Distributed.
Unfortunately, I am new to distributed computed, so I could not adapt the answer here for my purpose.
Mainly my problem is saving data from distributed computations back…

Helmut
- 311
- 1
- 9
2
votes
1 answer
How to get the worker name in dask cluster?
I am able to find out the worker address. But I want to know the name. Is there any method to find out?
from dask.distributed import Client, LocalCluster
cluster = LocalCluster(name='AAA',n_workers=1,threads_per_worker=2)
client =…

Professor
- 87
- 6
2
votes
1 answer
Difference between dask node and compute node for slurm configuration
First off, apologies if I use confusing or incorrect terminology, I am still learning.
I am trying to set up configuration for a Slurm-enabled adaptive cluster.
Documentation of the supercomputer and it’s Slurm configuration is documented here. Here…

pgierz
- 674
- 3
- 7
- 14
2
votes
2 answers
Dask dataframe parallel task
I want to create features(additional columns) from a dataframe and I have the following structure for many functions.
Following this documentation https://docs.dask.org/en/stable/delayed-best-practices.html I have come up with the code…

J.Ewa
- 205
- 3
- 14
2
votes
1 answer
How to parallelize concat in Dask?
I am learning to use Dask for parallel data processing for my university project. I connected two nodes to process data using Dask.
My data frame involves customer ID, dates, and transactions. The file has about 40GB. I used dask.dataframe to read…

cs201503
- 21
- 2
2
votes
1 answer
How do I submit a class to a Dask-Cluster?
I might misunderstand how Dasks submit() function is working. If I'm submitting a function of my class that is initializing a parameter it is not working.
Question: What is the correct way to submit a class to a dask-cluster using .submit()?
So, I…

Christine
- 53
- 8
2
votes
1 answer
Dask Dataframe shape attribute is giving wrong shape
I'm trying to find the shape of a subset dataframe of a larger dask dataframe. But Instead of getting the right shape (# of rows), I'm getting a wrong value
In the example, I stored the first 3 rows into a new dataframe, when I'm trying to find the…

jhanv
- 59
- 6
2
votes
1 answer
map_partitions runs twice when storing dask dataframe in parquet and records are counted
I have a dask process that runs a function on each dataframe partition. I let to_parquet do the
compute() that runs the functions.
But I also need to know the number of records in the parquet table. For that, I use ddf.map_partitions(len). Problem…

ps0604
- 1,227
- 23
- 133
- 330