Dask.distributed is a lightweight library for distributed computing in Python. It extends both the concurrent.futures and dask APIs to moderate sized clusters.
Questions tagged [dask-distributed]
1090 questions
9
votes
2 answers
distributed.worker Memory use is high but worker has no data to store to disk
distributed.worker - WARNING - Memory use is high but worker has no data to store to disk. Perhaps some other process is leaking memory? Process memory: 3.91 GB -- Worker memory limit: 2.00 GB
distributed.worker - WARNING - Worker is at 41% memory…

AHassett
- 91
- 2
- 3
7
votes
2 answers
How to properly use dask's upload_file() to pass local code to workers
I have functions in a local_code.py file that I would like to pass to workers through dask. I've seen answers to questions on here saying that this can be done using the upload_file() function, but I can't seem to get it working because I'm still…

Ryan Gallagher
- 93
- 1
- 8
7
votes
2 answers
Dask Equivalent of pd.to_numeric
I am trying to read multiple CSV files, each around 15 GB using dask read_csv. While performing this task, dask interprets a particular column as float, however it has some few values which are of string type and later on it fails when I try to…

Karrtik Iyer
- 131
- 1
- 6
7
votes
2 answers
How to pass multiple arguments to dask.distributed.Client().map?
import dask.distributed
def f(x, y):
return x, y
client = dask.distributed.Client()
client.map(f, [(1, 2), (2, 3)])
Does not work.
[,

mathtick
- 6,487
- 13
- 56
- 101
7
votes
1 answer
Get ID of Dask worker from within a task
Is there a worker ID, or some unique identifier that a dask worker can access programmatically from within a task?

MRocklin
- 55,641
- 23
- 163
- 235
7
votes
1 answer
Convert spark dataframe to dask dataframe
Is there a way to directly convert a Spark dataframe to a Dask dataframe.?
I currently am using Spark's .toPandas() function to convert it into a pandas dataframe and then into a dask dataframe.
I believe this is inefficient operation and is not…

vva
- 133
- 4
- 11
7
votes
1 answer
Local use of dask: to Client() or not to Client()?
I am trying to understand the use patterns for Dask on a local machine.
Specifically,
I have a dataset that fits in memory
I'd like to do some pandas operations
groupby...
date parsing
etc.
Pandas performs these operations via a single core and…

Jonathan
- 1,287
- 14
- 17
7
votes
1 answer
How do I check if there is an already running dask scheduler?
I want to start a local cluster from python with a specific number of workers, and then connect a client to it.
cluster = LocalCluster(n_workers=8, ip='127.0.0.1')
client = Client(cluster)
But before, I want to check if there is an existing local…

medRa
- 73
- 1
- 4
7
votes
3 answers
Semaphores in dask.distributed?
I have a dask cluster with n workers and want the workers to do queries to the database. But the database is only capable of handling m queries in parallel where m < n. How can I model that in dask.distributed? Only m workers should work on such a…

Christian Trebing
- 398
- 2
- 7
7
votes
2 answers
what is the default directory where dask workers store results or files.?
[mapr@impetus-i0057 latest_code_deepak]$ dask-worker 172.26.32.37:8786
distributed.nanny - INFO - Start Nanny at: 'tcp://172.26.32.36:50930'
distributed.diskutils - WARNING - Found stale lock file and directory…

TheCodeCache
- 820
- 1
- 7
- 27
6
votes
1 answer
Dask map method in fuction with multiple arguments
I want to apply the Client.map method to a function that uses multiple arguments as does the Pool.starmap method of multiprocessing. Here is an example
from contextlib import contextmanager
from dask.distributed import Client
@contextmanager
def…

Andrex
- 602
- 1
- 7
- 22
6
votes
2 answers
Reload Dask worker containers automatically on code change
I have the Dask code below that submits N workers, where each worker is implemented in a Docker container:
default_sums = client.map(process_asset_defaults, build_worker_args(req, numWorkers))
future_total_sum = client.submit(sum,…

ps0604
- 1,227
- 23
- 133
- 330
6
votes
3 answers
Deploying a cluster of containers in Azure
I have a Docker application that works fine in my laptop on Windows using compose and starting multiple instances of a container as a Dask cluster.
The name of the service is "worker" and I start two container instances like so:
docker compose up…

ps0604
- 1,227
- 23
- 133
- 330
6
votes
2 answers
Dask distributed.scheduler - ERROR - Couldn't gather keys
import joblib
from sklearn.externals.joblib import parallel_backend
with joblib.parallel_backend('dask'):
from dask_ml.model_selection import GridSearchCV
import xgboost
from xgboost import XGBRegressor
grid_search =…

praveen pravii
- 193
- 2
- 9
6
votes
2 answers
Dask Memory leakage issue with json and requests
This is just a sample minimal test to reproduce memory leakage issue in remote Dask kubernetes cluster.
def load_geojson(pid):
import requests
import io
r =…

jsanjayce
- 272
- 5
- 15