0

I've installed databricks-connect on Windows 10 with the instructions here: https://docs.databricks.com/dev-tools/databricks-connect.html

After running databricks-connect configure and entering all values, i'm running databricks-connect test. This is the output I'm getting, and it hangs:

* PySpark is installed at c:\users\user\.conda\envs\myenv\lib\site-packages\pyspark
* Checking SPARK_HOME
* Checking java version
java version "1.8.0_251"
Java(TM) SE Runtime Environment (build 1.8.0_251-b08)
Java HotSpot(TM) 64-Bit Server VM (build 25.251-b08, mixed mode)
* Skipping scala command test on Windows
* Testing python command
The system cannot find the path specified.

Digging a bit deeper, it seems that the underlying pyspark package fails to initialize. It fails on this line:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

When I try to run this manually, it hangs. I guess this is a problem is either the local Spark or the required Hadoop (and winutils.exe) installations, but databricks-connect requires a fresh pyspark installation (doc says to uninstall pyspark prior to installation).

Would be happy for any references for:

  1. Fixing the databricks-connect issue
  2. Fixing the underlying pyspark installation issue
Omri374
  • 2,555
  • 3
  • 26
  • 40

0 Answers0