1

I am using DataBricks Community Edition and the cluster on which my notebook is running is showing:

Cluster info

that it has a driver with 15 gb memory and 2 cores.

Whereas when I get the Spark config in my notebook , it shows ; enter image description here Why is it still showing local[8] ?

Is it running on my local machine or on the cluster ?

What I am assuming is that the community edition is just giving me a driver node and it using my local machine's cpu cores as executor nodes.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Karan Dhar
  • 47
  • 4

1 Answers1

2

No, on Databricks Community edition everything is running in the cloud, nothing is running on your local machine. Community edition is similar to running Spark on your local laptop without executors. But it's a managed service to try something very quick, access data in the cloud storage, etc. You can access it from anywhere, not necessary to have Spark installed on your device.

Regarding local[8] - most probably it's a problem in configuration - when starting spark shell you can specify as many cores as you want, like, local[64], but if you have only 2, then these 2 will be used.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Can we assume that running spark in Databricks community edition is similar to running scikit-learn? since scikit-learn executes machine learning algorithms on a single machine? – Sama Feb 25 '23 at 05:36
  • I run an ML application twice, the first using Databricks community edition (only 1 driver), and the second using Standard Databricks with 1 driver and 2 workers. The execution time dramatically decreases in the second time. Can we consider that the community edition has no cluster at all? – Sama Feb 25 '23 at 05:40
  • Community edition uses so-called “single node clusters” - you can create them on “standard edition” as well. Really community edition is just to try what it’s look like to work on Databricks, not to provide a full environment. All compute resources cost money… – Alex Ott Feb 25 '23 at 07:38