Questions tagged [dask-distributed]

Dask.distributed is a lightweight library for distributed computing in Python. It extends both the concurrent.futures and dask APIs to moderate sized clusters.

1090 questions
0
votes
1 answer

How to create a dask-array from CuPy array?

I'm trying to launch dask.cluster.Kmeans with the huge amount of data. Working with CPU is OK since i wrap numpy arrays with dask.array. Working with GPU doesn't seem to be possible due to not implemented functionalities in cupy. I've tried to…
0
votes
1 answer

Fail Dask application when too many workers fail

I'm running a Dask (1.2) application using Dask YARN (0.6.0) on an EMR cluster. Today I got into a situation where my workers were failing (due to a HDFS error) and the skein.ApplicationMaster would continuously recreate new workers. Is there a way…
gallamine
  • 865
  • 2
  • 12
  • 26
0
votes
1 answer

Cannot run dask-mpi with Python 3.7 -- timeout when connecting client to dask-mpi scheduler

I'm attempting to run the Dask-MPI "Getting Started" (http://mpi.dask.org/en/latest/) example in a fresh Anaconda environment. I set up an environment using conda create -n dask-mpi -c conda-forge python=3.7 dask-mpi conda activate dask-mpi Inside…
nleaf
  • 58
  • 3
0
votes
1 answer

How can I send to a remote dask-distributed cluster objects whose source code only exists locally?

I have a remote dask-distributed cluster to which I want to send a series of objects to be used during computations. The problem is the source code that defines the classes of those objects only exists locally and, as a consequence, pickling does…
0
votes
1 answer

gathering a large dataframe back into master in dask distributed

I have a large (~180K row) dataframe for which df.compute() hangs when running dask with the distributed scheduler in local mode on an AWS m5.12xlarge (98 cores). All the worker remain nearly idle However df.head(df.shape[0].compute(),…
Daniel Mahler
  • 7,653
  • 5
  • 51
  • 90
0
votes
1 answer

dask-yarn on cluster: Unable to connect to application

I am trying to use dask-yarn to distribute Python jobs on a cluster. I'm using the following code to create the cluster: from dask_yarn import YarnCluster cluster = YarnCluster(environment='.conda/envs/myconda', worker_vcores=2,…
0
votes
1 answer

Using Not Yet Implemented Pandas Functions in Dask

I believe I saw a recommendation in one of the Dask tutorials on how to use Pandas functions that are not yet implemented in the Dask framework when working with Dask dataframes, but I seem to have misplaced where I saw that. For example, I would…
dan
  • 183
  • 13
0
votes
1 answer

Possible to display worker names in Dask web UI?

I am able to add workers to the dask-scheduler and they appear in the web ui, but the workers are listed by their IP addresses, not the names I've given them. When I create the workers (in a Python script), I do set the name: import dask import…
dan
  • 183
  • 13
0
votes
1 answer

How to use dask.distributed API to specify the options for starting Bokeh web interface?

I'm trying to use dask.distributed Python API to start a scheduler. The example provided in http://distributed.dask.org/en/latest/setup.html#using-the-python-api works as expected but it does not provide insight on how to supply the options need to…
0
votes
1 answer

Why is dask returning none on a CUDA function?

I'm trying to layer dask on top of my cuda functions, but when dask returns I get a NoneType object. from numba import cuda import numpy as np from dask.distributed import Client, LocalCluster @cuda.jit() def addingNumbersCUDA (big_array,…
Bryce Booze
  • 165
  • 1
  • 11
0
votes
1 answer

Waiting for external dependencies in dask

Context: I'm using custom dask graphs to manage and distribute computations. Problem: Some tasks include reading in files which are produced outside of dask and not necessarily available at the time of calling…
malbert
  • 308
  • 1
  • 7
0
votes
1 answer

Dask Distributed Local Directory

I would like to direct all dask temporary data to my fast and big disk at /mnt/1. I am running the scheduler like so: dask-scheduler --local-directory /mnt/1 and the workers: dask-worker 127.0.0.1:8786 --memory-limit 16GB --nthreads 1 --nprocs 6…
Stephen
  • 107
  • 9
0
votes
2 answers

Dask failing with/due to Tornado error 'too many files open'

I am running Jupyter notebook launched from Anaconda. When trying to initialize a distributed Dask environment the following Tornado package error is thrown: tornado.application - ERROR - Multiple exceptions in yield list Traceback (most recent…
MikeB2019x
  • 823
  • 8
  • 23
0
votes
1 answer

Dask use broadcasted pandas.DataFrame in apply function

I have some code which samples a record from a pandas.DataFrame for each record in a dask.DataFrame for k times. But it throws a warning: UserWarning: Large object of size 1.12 MB detected in task graph: ( metric label group_1 group_2 6251…
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
0
votes
1 answer

Why only a worker is used?

I'm experimenting with Dask by running a local cluster with four workers on my laptop. I distribute a Pandas dataframe between the workers, but when I run a function on them I see from the dashboard that only one of them is actually used. What am I…
Vincenzo Lavorini
  • 1,884
  • 2
  • 15
  • 26