I have a Dask cluster on AKS and I want to run a function f
in parallel, but have this function run in a single process allocated in a single pod. According to the documentation on Worker Resources I should start each worker with dask-worker scheduler:8786 --nthreads 6 --resources "process=1"
. I need it because f
is using multithreading internally.
# Example adapting up to 10 pods
from dask.distributed import Client
from dask_kubernetes import KubeCluster
cluster = KubeCluster(pod_template="pod_template.yml", deploy_mode="remote")
cluster.adapt(minimum=0, maximum=10)
client = Client(cluster)
# for this example suppose f has no arguments
futures = [client.submit(f, resources={"process": 1}) for _ in range(5)] # 5 execution of f (could be map but this is an example)
results = [ft.result() for ft in futures]
When I execute the code above, 5 worker pods are raised, but the executions of f
are carried out in only one of these 5 and sequentially.
If instead of adapt
method I set manually cluster.scale(5)
, the executions of f
most of the time run as I wish. I say most of the time because sometimes the behavior is similar to that of the adapt
method. This behavior seems very strange to me.
Here is my pod_template.yaml
file:
apiVersion: v1
kind: Pod
spec:
restartPolicy: Never
containers:
- image: MyCustomDockerImage
imagePullPolicy: IfNotPresent
args: [dask-worker, --nthreads, '6', --no-dashboard, --memory-limit, 8GB, --death-timeout, '60', --resources, 'process=1']
name: testdask
resources:
limits:
cpu: "8"
memory: 8G
requests:
cpu: "8"
memory: 8G
imagePullSecrets:
- name: acr-secret
tolerations:
- key: workloadpool
operator: "Equal"
value: "true"
effect: "NoSchedule"
nodeSelector:
nodepool: workloadpool