Dask.distributed is a lightweight library for distributed computing in Python. It extends both the concurrent.futures and dask APIs to moderate sized clusters.
Questions tagged [dask-distributed]
1090 questions
0
votes
1 answer
How to create a dask-array from CuPy array?
I'm trying to launch dask.cluster.Kmeans with the huge amount of data.
Working with CPU is OK since i wrap numpy arrays with dask.array.
Working with GPU doesn't seem to be possible due to not implemented functionalities in cupy.
I've tried to…

Rostislav Povelikin
- 61
- 7
0
votes
1 answer
Fail Dask application when too many workers fail
I'm running a Dask (1.2) application using Dask YARN (0.6.0) on an EMR cluster. Today I got into a situation where my workers were failing (due to a HDFS error) and the skein.ApplicationMaster would continuously recreate new workers.
Is there a way…

gallamine
- 865
- 2
- 12
- 26
0
votes
1 answer
Cannot run dask-mpi with Python 3.7 -- timeout when connecting client to dask-mpi scheduler
I'm attempting to run the Dask-MPI "Getting Started" (http://mpi.dask.org/en/latest/) example in a fresh Anaconda environment.
I set up an environment using
conda create -n dask-mpi -c conda-forge python=3.7 dask-mpi
conda activate dask-mpi
Inside…

nleaf
- 58
- 3
0
votes
1 answer
How can I send to a remote dask-distributed cluster objects whose source code only exists locally?
I have a remote dask-distributed cluster to which I want to send a series of objects to be used during computations. The problem is the source code that defines the classes of those objects only exists locally and, as a consequence, pickling does…
0
votes
1 answer
gathering a large dataframe back into master in dask distributed
I have a large (~180K row) dataframe for which
df.compute()
hangs when running dask with the distributed scheduler in local mode on an
AWS m5.12xlarge (98 cores).
All the worker remain nearly idle
However
df.head(df.shape[0].compute(),…

Daniel Mahler
- 7,653
- 5
- 51
- 90
0
votes
1 answer
dask-yarn on cluster: Unable to connect to application
I am trying to use dask-yarn to distribute Python jobs on a cluster.
I'm using the following code to create the cluster:
from dask_yarn import YarnCluster
cluster = YarnCluster(environment='.conda/envs/myconda', worker_vcores=2,…

Nik Berry
- 11
- 1
0
votes
1 answer
Using Not Yet Implemented Pandas Functions in Dask
I believe I saw a recommendation in one of the Dask tutorials on how to use Pandas functions that are not yet implemented in the Dask framework when working with Dask dataframes, but I seem to have misplaced where I saw that. For example, I would…

dan
- 183
- 13
0
votes
1 answer
Possible to display worker names in Dask web UI?
I am able to add workers to the dask-scheduler and they appear in the web ui, but the workers are listed by their IP addresses, not the names I've given them.
When I create the workers (in a Python script), I do set the name:
import dask
import…

dan
- 183
- 13
0
votes
1 answer
How to use dask.distributed API to specify the options for starting Bokeh web interface?
I'm trying to use dask.distributed Python API to start a scheduler. The example provided in http://distributed.dask.org/en/latest/setup.html#using-the-python-api works as expected but it does not provide insight on how to supply the options need to…

Alin Bobolea
- 3
- 4
0
votes
1 answer
Why is dask returning none on a CUDA function?
I'm trying to layer dask on top of my cuda functions, but when dask returns I get a NoneType object.
from numba import cuda
import numpy as np
from dask.distributed import Client, LocalCluster
@cuda.jit()
def addingNumbersCUDA (big_array,…

Bryce Booze
- 165
- 1
- 11
0
votes
1 answer
Waiting for external dependencies in dask
Context:
I'm using custom dask graphs to manage and distribute computations.
Problem:
Some tasks include reading in files which are produced outside of dask and not necessarily available at the time of calling…

malbert
- 308
- 1
- 7
0
votes
1 answer
Dask Distributed Local Directory
I would like to direct all dask temporary data to my fast and big disk at /mnt/1. I am running the scheduler like so:
dask-scheduler --local-directory /mnt/1
and the workers:
dask-worker 127.0.0.1:8786 --memory-limit 16GB --nthreads 1 --nprocs 6…

Stephen
- 107
- 9
0
votes
2 answers
Dask failing with/due to Tornado error 'too many files open'
I am running Jupyter notebook launched from Anaconda. When trying to initialize a distributed Dask environment the following Tornado package error is thrown:
tornado.application - ERROR - Multiple exceptions in yield list
Traceback (most recent…

MikeB2019x
- 823
- 8
- 23
0
votes
1 answer
Dask use broadcasted pandas.DataFrame in apply function
I have some code which samples a record from a pandas.DataFrame for each record in a dask.DataFrame for k times.
But it throws a warning:
UserWarning: Large object of size 1.12 MB detected in task graph:
( metric label group_1 group_2
6251…

Georg Heiler
- 16,916
- 36
- 162
- 292
0
votes
1 answer
Why only a worker is used?
I'm experimenting with Dask by running a local cluster with four workers on my laptop.
I distribute a Pandas dataframe between the workers, but when I run a function on them I see from the dashboard that only one of them is actually used.
What am I…

Vincenzo Lavorini
- 1,884
- 2
- 15
- 26