3

I was performing yet another execution of local Scala code against the remote Spark cluster on Databricks and got this.

Exception in thread "main" com.databricks.service.DependencyCheckWarning: The java class <something> may not be present on the remote cluster. It can be found in <something>/target/scala-2.11/classes. To resolve this, package the classes in <something>/target/scala-2.11/classes into a jar file and then call sc.addJar() on the package jar. You can disable this check by setting the SQL conf spark.databricks.service.client.checkDeps=false.

I have tried reimporting, cleaning and recompiling the sbt project to no avail.

Anyone know how to deal with this?

zaxme
  • 1,065
  • 11
  • 29

1 Answers1

3

Apparently the documentation has that covered:

spark.sparkContext.addJar("./target/scala-2.11/hello-world_2.11-1.0.jar")

I guess it makes sense that everything that you are writing as code external to Spark is considered a dependency. So a simple sbt publishLocal and then pointing to the jar path in above command will sort you out.

My main confusion came from the fact that I didn't need to do this for a very long while until at some point this mechanism kicked it. Rather inconsistent behavior I'd say.


A personal observation after working with this setup is that it seems you only need to publish a jar a single time. I have been changing my code multiple times and changes are reflected even though I have not been continuously publishing jars for the new changes I made. That makes the whole task a one off. Still confusing though.

zaxme
  • 1,065
  • 11
  • 29
  • Thanks for sharing the answer, you can accept it as answer( click on the check mark beside the answer to toggle it from greyed out to filled in.). This can be beneficial to other community members. Thank you. – CHEEKATLAPRADEEP Mar 06 '20 at 05:04
  • Thanks for the support. Since you think it is useful can I ask you to upvote it? – zaxme Mar 06 '20 at 08:21