I've installed databricks-connect
on Windows 10 with the instructions here: https://docs.databricks.com/dev-tools/databricks-connect.html
After running databricks-connect configure
and entering all values, i'm running databricks-connect test
. This is the output I'm getting, and it hangs:
* PySpark is installed at c:\users\user\.conda\envs\myenv\lib\site-packages\pyspark
* Checking SPARK_HOME
* Checking java version
java version "1.8.0_251"
Java(TM) SE Runtime Environment (build 1.8.0_251-b08)
Java HotSpot(TM) 64-Bit Server VM (build 25.251-b08, mixed mode)
* Skipping scala command test on Windows
* Testing python command
The system cannot find the path specified.
Digging a bit deeper, it seems that the underlying pyspark
package fails to initialize. It fails on this line:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
When I try to run this manually, it hangs. I guess this is a problem is either the local Spark or the required Hadoop (and winutils.exe) installations, but databricks-connect
requires a fresh pyspark installation (doc says to uninstall pyspark prior to installation).
Would be happy for any references for:
- Fixing the databricks-connect issue
- Fixing the underlying pyspark installation issue