I am using Databricks cluster to execute spark application..
My application having some dependency on few libraries but now these libraries are not available via the Databricks install new library option.
I came to know the through the Fat jar or Uber jar I can add multiple libraries and pass this to the cluster.
I also came to know that to create a fat jar you have to provide a main class so I have written a simple program in my local system and added the dependencies to the build.sbt file.
I am using the 'sbt assembly' command to create fat jar.
Please note that I am not using the library in my sample program.
My aim is to create a fat jar that inherits all the required jar in it so that my other Spark based application can access the libraries via this fat jar..
I did the following steps.
'Sample Program'
def main(args: Array[String]): Unit = {
print("Hello World")
}
}
'Build.sbt'
name := "CrealyticsFatJar"
version := "0.1"
scalaVersion := "2.11.12"
// https://mvnrepository.com/artifact/com.crealytics/spark-excel
libraryDependencies += "com.crealytics" %% "spark-excel" % "0.12.0"
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
'assembly.sbt'
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.6")
But I am not sure whatever I am doing is correct and will help to execute my spark programs in Databricks cluster.
Q1) There might be some possibility that one library having dependency on the other library so If I have mention the library name in SBT then would it load other dependent libraries?
Q2) If I am not using the libraries to the existing program would it be available to the other program of the cluster.
Q3) After the installation of Fat jar in cluster- how do I access the libraries.. I mean by which name I would access the libraries..the import command...
Apologies if my questions are so silly. Thanks