-1

I have setup an AWS EMR cluster using 10 core nodes of type g4dn.xlarge (each machine/node conatins 1 GPU). When I run the following commands on Zeppelin Notebook, I see only 1 worker allotted in my LocalCUDACluster:

from dask_cuda import LocalCUDACluster
from dask.distributed import Client
cluster = LocalCUDACluster()
client = Client(cluster)

I tried passing n_workers=10 explicitly but it resulted in an error.

How do I make sure my LocalCUDACluser utilizes all of my other 9 nodes? What is the right way to setup a multi-node DASK-CUDA cluster? Any help regarding this is appreciated.

Putt
  • 299
  • 4
  • 10
  • It looks like you have 10 separate machines, while local cluster will use only the local gpu. – SultanOrazbayev Jun 07 '22 at 13:58
  • Yes, that is the case. May I know what is the right cluster initialization API to use for my use case instead of LocalCUDACluster() ? – Putt Jun 07 '22 at 14:32
  • 1
    Take a look at Coiled's GPU cluster page (https://docs.coiled.io/user_guide/gpu.html), for DIY solution you'll need to ask each GPU machine to connect to a scheduler and then connect the client to the scheduler. – SultanOrazbayev Jun 07 '22 at 14:36
  • 1
    Riiight, so the scheduler file will live on the scheduler machine, so one would have to download that file and then use `client = distributed.Client(scheduler_file='somefile.json')`... there is the extra complication of making sure that relevant communication ports are open... (easiest thing is to open up everything, but that's not secure)... apart from coiled there is also SaturnCloud that offers a similar service... – SultanOrazbayev Jun 08 '22 at 16:26
  • also this might work for you: https://cloudprovider.dask.org/en/latest/ – SultanOrazbayev Jun 08 '22 at 16:32
  • 1
    Thanks a lot for replying. The cluster and client are up and running now. I'll also look at these other services/libraries. Thanks a lot. – Putt Jun 08 '22 at 16:47

1 Answers1

0

There are a few options to setup a multi-worker cluster (with or without GPU), described here.

The docs don't seem to mention third-party solutions, but right now there are two companies offering these services: Coiled and Saturn Cloud.

SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46