How to use Dask on Databricks

Question

I want to use Dask on Databricks. It should be possible (I cannot see why not). If I import it, one of two things happens, either I get an ImportError but when I install distributed to solve this DataBricks just says Cancelled without throwing any errors.

score 5 · Answer 1 · answered Dec 09 '19 at 06:46

5

Anyone looking for an answer, check this medium blogpost. To prevent people from missing this in comments, I'm posting this as an answer.

answered Dec 09 '19 at 06:46

Vijay

1,030
11
34

score 1 · Answer 2 · answered Jun 04 '19 at 14:31

1

I don't think we have heard of anyone using Dask under databricks, but so long as it's just python, it may well be possible.

The default scheduler for Dask is threads, and this is the most likely thing to work. In this case you don't even need to install distributed.

For the Cancelled error, it sounds like you are using distributed, and, at a guess, the system is not allowing you to start extra processes (you could test this with the subprocess module). To work around, you could do

client = dask.distributed.Client(processes=False)

Of course, if it is indeed the processes that you need, this would not be great. Also, I have no idea how you might expose the dashboard's port.

answered Jun 04 '19 at 14:31

mdurant

27,272
5
45
74

1

This sadly still didn't work. However, this is starting to appear as a genuine limitation of Databricks itself which is sad because I actually think Dask is the future of distributed computing in python. – SARose Jun 06 '19 at 15:01
Don't tell Databricks that! :) – mdurant Jun 06 '19 at 15:36
Hi SARose - I'm curious as to WHY you want to use Dask on Databricks? ie. what's the driver here? – Rodney Jul 31 '19 at 04:40
1. For data-folks (RS, DS, MLEs), Spark errors and interplay between spark <=> ML libraries is just substandard at best \n 2. even with Koalas (pandas_on_spark), the underlying code is scala/spark. There are very opaque tasks that the actual user has usually no transparency into for debugging. – anakin Apr 18 '22 at 18:09

How to use Dask on Databricks

2 Answers2