1

I have databricks-connect 6.6.0 installed, which has a Spark version 2.4.6. I have been using the databricks cluster till now, but I am trying to switch to using a local spark session for unit testing. However, every time I run it, it still shows up on the cluster Spark UI as well as the local Spark UI on xxxxxx:4040.

I have tried initiating using SparkConf(), SparkContext(), and SQLContext() but they all do the same thing. I have also set the right SPARK_HOME, HADOOP_HOME, and JAVA_HOME, and downloaded winutils.exe separately, and none of these directories have spaces. I have also tried running it from console as well as from terminal using spark-submit.

This is one of the pieces of sample code I tried:

from pyspark.sql import SparkSession

spark = SparkSession.builder.master("local").appName("name").getOrCreate()
inp = spark.createDataFrame([('Person1',12),('Person2',14)],['person','age'])
op = inp.toPandas()

I am using: Windows 10, databricks-connect 6.6.0, Spark 2.4.6, JDK 1.8.0_265, Python 3.7, PyCharm Community 2020.1.1

Do I have to override the default/global spark session to initiate a local one? How would I do that? I might be missing something - The code itself runs fine, it's just a matter of local vs. cluster.

TIA

lesk_s
  • 365
  • 1
  • 9

1 Answers1

2

You can’t run them side by side. I recommend having two virtual environments using Conda. One for databricks-connect one for pyspark. Then just switch between the two as needed.

simon_dmorias
  • 2,343
  • 3
  • 19
  • 33