Questions tagged [dask-kubernetes]

Questions about using dask-kubernetes to create and run dask distributed clusters

Dask Kubernetes deploys Dask workers on Kubernetes clusters using native Kubernetes APIs. It is designed to dynamically launch short-lived deployments of workers during the lifetime of a Python process.

Full Documentation

54 questions
2
votes
1 answer

Working with big data on Dask Kubernetes in Azure Kubernetes Service(AKS)

I want do analysis on a dataset(like csv file) of 8gb which is in my laptop hard disk. I have already setup a dask kubernetes cluster on AKS with 1 scheduler and 3 worker with 7 gb each. How can I work on my dataset using this dask kubernetes…
dev
  • 23
  • 2
2
votes
1 answer

`dask-kubernetes` scheduler - worker on AWS

I've been trying to set up a dask.distributed cluster using kubernetes. Setting up the kube cluster itself is pretty straightforward, the problem I am currently struggling with is that I can't get the local scheduler to connect to the workers.…
1
vote
1 answer

Error trying to use Dask on Kubernetes with distributed workers

I'm attempting to deploy a dask application on Kubernetes/Azure. I have a Flask application server that is the client of a Dask scheduler/workers. I installed the Dask operator as described here: helm install --repo https://helm.dask.org…
ps0604
  • 1,227
  • 23
  • 133
  • 330
1
vote
0 answers

Dask Kubernetes Operator can't reach apis/external.metrics.k8s.io/v1beta1

My Dask Kubernetes Operator fails to start, because of this error: [2022-12-04 12:17:28,456] kopf._core.reactor.o [ERROR ] Request attempt #9/9 failed; escalating: GET https://10.64.0.1:443/apis/external.metrics.k8s.io/v1beta1 ->…
1
vote
0 answers

How to use all vCPUs on google cloud with dask

There are 16 vCPUs on my vertex AI Jupyter notebook, and I am writing a parallelized script. I wasn't sure if the right approach was to hardcode in parallel processing based on the number of vCPUs (and if so, how to choose nworkers vs nthreads).…
1
vote
1 answer

dask-getway on K8s using helm3: Error: failed to install CRD crds/daskclusters.yaml

I'm following the instructions to setup Dask on K8s Cluster. I'm on MacOS, have K8s running on Docker Desktop, kubectl version 1.22.5 and helm version 3.8.0. After adding the repository, downloading default configuration, installing helm chart using…
F Baig
  • 339
  • 1
  • 4
  • 13
1
vote
1 answer

Add a Persistent Volume Claim to a Kubernetes Dask Cluster

I am running a Dask cluster and a Jupyter notebook server on cloud resources using Kubernetes and Helm I am using a yaml file for the Dask cluster and Jupyter, initially taken from…
1
vote
1 answer

Dask cluster is not starting up

I am trying to start a dask cluster but it says the below error: Timed out trying to connect to 'tcp://100.100.160.25:2323' after 10 s: Timed out trying to connect to 'tcp://100.100.160.25:2323' after 10 s: connect() didn't finish in time
1
vote
0 answers

Dask Client fails to connect to cluster when running inside a Docker container

I am running Dask Gateway in a Kubernetes namespace. I am able to connect to the Gateway using the following code, while not running in a Docker container. from dask.distributed import Client from dask_gateway import Gateway gateway =…
jrdzha
  • 161
  • 2
  • 12
1
vote
0 answers

Why does my Dask job's performance get worse after five workers?

I am running Dask on an eight-node Kubernetes cluster with my manifest specifying one scheduler replica and eight worker replicas. My code is processing 80 files of about equal size, and I wanted to see how performance scales from one worker to…
user655321
  • 1,572
  • 2
  • 16
  • 33
1
vote
1 answer

How to configure GCP cluster for dask-workers in another region than scheduler was started

I have one kubernetes cluster in region us-east1 where dask-scheduler was started and i want to start another cluster in region us-west1 where would like run dask-workers. As I understand connection between scheduler and workers is bidirectional, so…
Habibutsu
  • 592
  • 1
  • 8
  • 20
1
vote
1 answer

Why do my Dask Futures get stuck in 'pending' and never finish?

I have some long-running code (~5-10 minute processing) that I'm trying to run as a Dask Future. It's a series of several discrete steps that I can either run as one function: result : Future = client.submit(my_function, arg1, arg2) Or I can split…
user655321
  • 1,572
  • 2
  • 16
  • 33
1
vote
1 answer

Is it possible to select workers for specific tasks in Dask?

I have a process I'm running on my Kubernetes cluster with Dask that consists of two map-reduce phases, but both maps across the nodes download potentially numerous large files to each worker. In order to avoid having two different machines process…
user655321
  • 1,572
  • 2
  • 16
  • 33
1
vote
2 answers

Connect to existing Kubernetes Dask cluster

Using Helm, I've created a Dask cluster. NAME READY STATUS RESTARTS AGE dask01-jupyter-aaa-aaaa 1/1 Running 0 3d19h dask01-scheduler-bbb-bbbb 1/1 Running 0 …
Seanny123
  • 8,776
  • 13
  • 68
  • 124
1
vote
1 answer

dask-kubernetes: Issue creating pod with uppercase username

I am learning dask-kubernetes on GKE. I stumbled across an asyncio error (ERROR:asyncio:Task exception was never retrieved). See steps below for the issue. However, additional guidance on using deploying dask-kubernetes with a remote Kubernetes…
Ray Bell
  • 1,508
  • 4
  • 18
  • 45