Dask.distributed is a lightweight library for distributed computing in Python. It extends both the concurrent.futures and dask APIs to moderate sized clusters.
Questions tagged [dask-distributed]
1090 questions
6
votes
0 answers
Is it possible to launch dask clusters on hpc (slurm) remotely from local computer?
I am new to Dask, I understand that to start the dask clusters, I would normally have to ssh to my hpc cluster and then start SLURMCluster() to start some clusters, then after it's started I need to Client('node_ip') on my local computer. I was…

user252046
- 399
- 2
- 11
6
votes
1 answer
Specify dashboard port for dask
Is there a way to manually specify the port for the dashboard when creating a dask cluster using dask-jobqueue? When 8787 is taken, it randomly picks a different port, which means that one needs to set up a different tunneling every time.
from…

tlamadon
- 970
- 9
- 18
6
votes
0 answers
Huge memory use difference between dask and dask.distributed
I am trying to use dask.delayed to compute a large matrix for use in a later calculation. I am only ever running the code on a single local machine. When I use a dask single-machine scheduler it works fine, but is a little slow. To access more…

Nick W.
- 61
- 4
6
votes
1 answer
Parallel Sklearn Model Building with Dask or Joblib
I have a large set of sklearn pipelines that I'd like to build in parallel with Dask. Here's a simple but naive sequential approach:
from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import LogisticRegression
from…

slaw
- 6,591
- 16
- 56
- 109
6
votes
1 answer
How can a dask worker access the total number of workers currently in the cluster?
My dask workers need to run init code that depends on the number of workers in the cluster. Can workers access such cluster metadata?

Randy Gelhausen
- 125
- 1
- 5
6
votes
1 answer
Dask prints warning to use client.scatter althought I'm using the suggested approach
In dask distributed I get the following warning, which I would not expect:
/home/miniconda3/lib/python3.6/site-packages/distributed/worker.py:739: UserWarning: Large object of size 1.95 MB detected in task graph:
…

dennis-w
- 2,166
- 1
- 13
- 23
5
votes
1 answer
Apply a function over the columns of a Dask array
What is the most efficient way to apply a function to each column of a Dask array? As documented below, I've tried a number of things but I still suspect that my use of Dask is rather amateurish.
I have a quite wide and quite long array, in the…

chameau13
- 626
- 7
- 24
5
votes
1 answer
Implement Equal-Width Intervals feature engineering in Dask
In equal-width discretization, the variable values are assigned to intervals of the same width. The number of intervals is user-defined and the width is determined by the minimum/maximum values and the number of intervals.
For example, given the…

ps0604
- 1,227
- 23
- 133
- 330
5
votes
1 answer
Dask computations slow down with time
I'm having the following issue with Dask. I noticed that the same computations take longer and longer as time passes. After I restart scheduler, the computations are fast again, and just keep slowing down. The figure below shows the time consumed by…

rafgonsi
- 83
- 7
5
votes
0 answers
Dask distributed KeyError
I am trying to learn Dask using a small example. Basically I read in a file and calculate row means.
from dask_jobqueue import SLURMCluster
cluster = SLURMCluster(cores=4, memory='24 GB')
cluster.scale(4)
from dask.distributed import Client
client…

Phoenix Mu
- 648
- 7
- 12
5
votes
2 answers
Dask: How to Add Security (TLS/SSL) to Dask Cluster?
I'm trying to figure out how to add a security layer to my Dask Cluster deployed using helm on GKE on GCP, that would force a user to input the certificate and key files into the Security Object, as explained in this documentation [1].…

Riley Hun
- 2,541
- 5
- 31
- 77
5
votes
1 answer
local dask cluster using docker-compose
I want to create a docker-compose.yml containing our company analysis toolchain. For this purpose, I add dask. The docker-compoe.yml looks like this:
docker-compose.yml
version: '3'
services:
jupyter:
build: docker/jupyter/.
ports:
-…

user2757652
- 353
- 2
- 9
5
votes
1 answer
Dask: Submit continuously, work on all submitted data
Having 500, continously growing DataFrames, I would like to submit operations on the (for each DataFrame indipendent) data to dask. My main question is: Can dask hold the continously submitted data, so I can submit a function on all the submitted…

gies0r
- 4,723
- 4
- 39
- 50
5
votes
0 answers
Dask distributed.nanny - WARNING - Restarting worker issue
I am using Dask and a bit confused.
I run the below and just get this, until the process crashes.
It uses 100% of all 4 CPU cores while its failing;
Can anyone advise me?
distributed.nanny - WARNING - Restarting worker
Here is the code
import…

kikee1222
- 1,866
- 2
- 23
- 46
5
votes
1 answer
Streamz/Dask: gather does not wait for all results of buffer
Imports:
from dask.distributed import Client
import streamz
import time
Simulated workload:
def increment(x):
time.sleep(0.5)
return x + 1
Let's suppose I'd like to process some workload on a local Dask client:
if __name__ == "__main__":
…

daniel451
- 10,626
- 19
- 67
- 125