I am currently connecting my Visual Studio Code to my Databricks Workspace using the Databricks Connect feature (local machine is Windows). To do so, I followed instructions here and here. Now, I got it to work for PySpark. Meaning that I established the connection and I can execute some PySpark Code against my Cluster:
I would like to repeat the same small example using scala code. But I do not know how? The Databricks documentation is not exhaustive and my build.sbt fails. The build from this tutorial fails for me as well. Following the documentation I have created a build.sbt which looks as follows:
name := "scala_test"
version := "1.0"
scalaVersion := "2.12"
// this should be set to the path returned by ``databricks-connect get-jar-dir``
unmanagedBase := new java.io.File("C:/Users/user/Anaconda3/envs/databricksEnv/lib/site-
packages/pyspark/jars")
mainClass := Some("com.example.Test")
I adjusted the build from the documentation to my scala version and adapted the file path. However, the build fails with the following error:
2022.02.07 11:27:34 ERROR sbt command failed: C:\Program Files\Eclipse Adoptium\jdk-8.0.322.6-hotspot\jre\bin\java -Djline.terminal=jline.UnsupportedTerminal -Dsbt.log.noformat=true -Dfile.encoding=UTF-8 -jar
Note that I am new to scala and not entirely familiar with builds etc. Hence I struggle with the debugging of this issue. Here the full output log for the scala build on terminal:
I am in general a little confused how Databricks Connect works but would be super happy to get it running :)