0

I am using Databricks cluster to execute spark application..
My application having some dependency on few libraries but now these libraries are not available via the Databricks install new library option.
I came to know the through the Fat jar or Uber jar I can add multiple libraries and pass this to the cluster.
I also came to know that to create a fat jar you have to provide a main class so I have written a simple program in my local system and added the dependencies to the build.sbt file.
I am using the 'sbt assembly' command to create fat jar.
Please note that I am not using the library in my sample program.

My aim is to create a fat jar that inherits all the required jar in it so that my other Spark based application can access the libraries via this fat jar..

I did the following steps.

'Sample Program'

  def main(args: Array[String]): Unit = {
    print("Hello World")
  }

}

'Build.sbt'

name := "CrealyticsFatJar"

version := "0.1"

scalaVersion := "2.11.12"

// https://mvnrepository.com/artifact/com.crealytics/spark-excel
libraryDependencies += "com.crealytics" %% "spark-excel" % "0.12.0"

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs @ _*) => MergeStrategy.discard
  case x => MergeStrategy.first
}

'assembly.sbt'

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.6")

But I am not sure whatever I am doing is correct and will help to execute my spark programs in Databricks cluster.

Q1) There might be some possibility that one library having dependency on the other library so If I have mention the library name in SBT then would it load other dependent libraries?
Q2) If I am not using the libraries to the existing program would it be available to the other program of the cluster.
Q3) After the installation of Fat jar in cluster- how do I access the libraries.. I mean by which name I would access the libraries..the import command...

Apologies if my questions are so silly. Thanks

BlueStar
  • 57
  • 8
  • 1
    _"BT then would it load other dependent libraries"_ yes, no worry about that. _"would it be available to the other program of the cluster."_ No, each program has its own classpath, that is the key word to learn more about that. - "would it be available to the other program of the cluster"_ you are not installing it, you just run it. If you want to install many libraries on your cluster that is a different story, you would need to upload ll the jars to a common location, again learn more about classpath to understand how this works. – Luis Miguel Mejía Suárez Mar 18 '20 at 19:19
  • 1
    Finally, that is not the correct way. You need to mark the **spark** dependencies as `provided` and exclude the scala standard library from the fat jar: `assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)` – Luis Miguel Mejía Suárez Mar 18 '20 at 19:20
  • @Luis Miguel Mejía Suárez can you please help me out of the issue.. still I am not sure How to deal with such kind of problem.. – BlueStar Mar 18 '20 at 19:24
  • Sorry I do not understand, what problem? what you are doing is correct you create an **SBT** project with your application code and your dependencies, create the fat jar and deploy it into your cluster. I just warned you that you need to mark the spark base dependencies as `Provided` and the scala std library has to be excluded of that jar. – Luis Miguel Mejía Suárez Mar 18 '20 at 19:28
  • @Luis Miguel Mejía Suárez can I create a fat jar without writing any sample program?? – BlueStar Mar 18 '20 at 19:40
  • and how Would I let my spark application know where actually the library exist..the import statement... – BlueStar Mar 18 '20 at 19:44
  • I mean the idea is that you create a **Jar** with all the dependencies for each of your programs. If what you want to do is to install some dependencies on a cluster so they are available for jupyter notebooks or other applications then it would be better to rewrite the question to be more clear. – Luis Miguel Mejía Suárez Mar 18 '20 at 20:07
  • @ Luis Miguel Mejía Suárez - So basically nothing is wrong.. I had few doubts but most of the things are clear to me.. – BlueStar Mar 19 '20 at 09:58

0 Answers0