I have setup databricks-connect version 5.5.0. This runtime includes Scala 2.11 and Spark 2.4.3. All the Spark code I have written has been correctly executed and without any issues until I tried calling sparkContext.wholeTextFiles
. The error that I get is the following:
Exception in thread "main" java.lang.NoClassDefFoundError: shaded/databricks/v20180920_b33d810/com/google/common/base/Preconditions
at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.ensureAuthority(AzureBlobFileSystem.java:775)
at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:94)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:500)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:469)
at org.apache.spark.SparkContext$$anonfun$wholeTextFiles$1.apply(SparkContext.scala:997)
at org.apache.spark.SparkContext$$anonfun$wholeTextFiles$1.apply(SparkContext.scala:992)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:820)
at org.apache.spark.SparkContext.wholeTextFiles(SparkContext.scala:992)
...
Caused by: java.lang.ClassNotFoundException: shaded.databricks.v20180920_b33d810.com.google.common.base.Preconditions
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
... 20 more
One attempt at solving the problem was to move to the latest Databricks runtime - which at the time of this writing is 6.5. That didn't help. I have proceeded going back in versions - 6.4 and 6.3 - since they use different Spark versions but to no avail.
Another thing that I tried was adding "com.google.guava" % "guava" % "23.0"
as a dependency to my build.sbt
. That in itself results in errors like:
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: abfss://abc-fs@cloud.dfs.core.windows.net/paths/something, expected: file:///
I feel that going down the road of satisfying in and every dependency that somehow is not included in the jar may not be the best option.
I wonder if someone has had a similar experience and if so - how did they solve this.
I am willing to give more context if that is necessary.
Thank you!