Dask.distributed is a lightweight library for distributed computing in Python. It extends both the concurrent.futures and dask APIs to moderate sized clusters.
Questions tagged [dask-distributed]
1090 questions
0
votes
1 answer
How to prevent dask client from dying on worker exception?
I'm not understanding the resiliency model in dask distributed.
Problem
Exceptions raised by a workers kills embarrassingly parallel dask operation. All workers and clients die if any worker encounters an exception.
Expected Behavior
Reading here:…

bw4sz
- 2,237
- 2
- 29
- 53
0
votes
1 answer
What does one enter on the command line to run spark in a bokeh serve app? Do I simply separate the two command line entries by &&?
My effort does not work:
/usr/local/spark/spark-2.3.2-bin-hadoop2.7/bin/spark-submit --driver-memory 6g --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.3.2 runspark.py && bokeh serve --show bokeh_app
runspark.py contains the…

Andre Mayers PhD
- 1
- 3
0
votes
1 answer
parallel execution of dask `DataFrame.set_index()`
I am trying to create an index on a large dask dataframe. No matter what scheduler I am unable to utilize more than the equivalent of one core for the operation. The code is:
(ddf.
.read_parquet(pq_in)
.set_index('title', drop=True,…

Daniel Mahler
- 7,653
- 5
- 51
- 90
0
votes
1 answer
Limitations to using LocalCluster? Crashing persisting 50GB of data to 90GB of memory
System Info: CentOS, python 3.5.2, 64 cores, 96 GB ram
So I'm trying to load a large array (50GB) from a hdf file into ram (96GB). Each chunk is around 1.5GB less than the worker memory limit. It never seems to complete sometimes crashing or…

dead_zero
- 15
- 1
- 5
0
votes
1 answer
Cannot start dask cluster over SSH
I'm trying to start a dask cluster over SSH, but I am encountering a strange errors like these:
Exception in thread Thread-6:
Traceback (most recent call last):
File "/home/localuser/miniconda3/lib/python3.6/threading.py", line 916, in…

suvayu
- 4,271
- 2
- 29
- 35
0
votes
1 answer
dask jobqueue worker failure at startup 'Resource temporarily unavailable'
I'm running dask over slurm via jobqueue and I have been getting 3 errors pretty consistently...
Basically my question is what could be causing these failures? At first glance the problem is that too many workers are writing to disk at once, or my…

Mr. Buttons
- 463
- 1
- 3
- 9
0
votes
1 answer
Route to dask worker debug pages
The docs say:
Debug Worker pages for each worker at http://worker-address:8789.
These pages have detailed diagnostic information about the worker.
Like the diagnostic scheduler pages they are of more utility to
developers or to people looking to…

Adam Thornton
- 33
- 4
0
votes
0 answers
Is there a way to store and display dask distributed history
Is there a way to store an display(over Bokeh) dask distributed history
I would like to analyse/compare old dask distributed runs

sami
- 501
- 2
- 6
- 18
0
votes
1 answer
How to composite tasks in dask-distributed
I am trying to run a joblib parallel loop inside of a threaded dask-distributed cluster (see below the reason), but I can't get any speedup due to GIL-lock. Here's an example:
def task(x):
""" Sample single-process task that takes between 2 and…

A32167
- 26
- 2
0
votes
1 answer
Analyzing data flow of Dask dataframes
I have a dataset stored in a tab-separated text file. The file looks as follows:
date time temperature
2010-01-01 12:00:00 10.0000
...
where the temperature column contains values in degrees Celsius (°C).
I compute the daily average…

Giorgio
- 5,023
- 6
- 41
- 71
0
votes
1 answer
Unable to catch KeyboardInterrupt exception after starting dask.distributed Client/LocalClient
I'm trying to use Ctrl+C to gracefully stop my running code, including a local dask.distrubted Client. The code below is an example of my setup. When I use Ctrl+C, the stop() method is called properly, however dask Client seems to be improperly…

user7458
- 19
- 2
0
votes
1 answer
Dask distributed perform computations without returning data
I have a dynamic Dask Kubernetes cluster.
I want to load 35 parquet files (about 1.2GB) from Gcloud storage into Dask Dataframe then process it with apply() and after saving the result to parquet file to Gcloud.
During loading files from Gcloud…

Vladyslav Moisieienkov
- 4,118
- 4
- 25
- 32
0
votes
2 answers
Dask Distributed with Asynchronous Real-time Parallelism
I'm reading the documentation on dask.distributed and it looks like I could submit functions to the distributed cluster via client.submit().
I have an existing function some_func that is grabbing individual documents (say, a text file)…

slaw
- 6,591
- 16
- 56
- 109
0
votes
0 answers
YarnCluster constructor hangs in dask-yarn
Im using dask-yarn version 0.3.1. Following the basic example on https://dask-yarn.readthedocs.io/en/latest/.
from dask_yarn import YarnCluster
from dask.distributed import Client
# Create a cluster where each worker has two cores and eight GB of…

user1738628
- 51
- 1
0
votes
0 answers
Dask client scatter is taking a long time for size of file dict in memory
I'm new to Dask and have recently made my foray into parallel computing with this nice and wonderful package. However, in my implementation, I've been struggling to understand why does it take 6 mins for me to scatter a python dict in my scheduler…

Winston Tan
- 1
- 2