4

I started making a spark streaming job and got a producer up for kinesis endpoint. After getting that working I started making a consumer but I ran into problems with building it.

I am using the assembly plugin to create a single jar that contains all the dependencies. The project's dependencies are as follows.

libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.4.1" % "provided",
"org.apache.spark" %% "spark-sql" % "1.4.1" % "provided",
"org.apache.spark" %% "spark-streaming" % "1.4.1" % "provided",
"org.apache.spark" %% "spark-streaming-kinesis-asl" % "1.4.1",
"org.scalatest" %% "scalatest" % "2.2.1" % "test",
"c3p0" % "c3p0" % "0.9.1.+",
"com.amazonaws" % "aws-java-sdk" % "1.10.4.1",
"mysql" % "mysql-connector-java" % "5.1.33",
"com.amazonaws" % "amazon-kinesis-client" % "1.5.0"

)

When I run assembly, the files can compile but it fails during the merge phase with the error

[error] (streamingClicks/*:assembly) deduplicate: different file contents found in the following:
[error] /Users/adam/.ivy2/cache/org.apache.spark/spark-network-common_2.10/jars/spark-network-common_2.10-1.4.1.jar:META-INF/maven/com.google.guava/guava/pom.properties
[error] /Users/adam/.ivy2/cache/com.google.guava/guava/bundles/guava-18.0.jar:META-INF/maven/com.google.guava/guava/pom.properties

This is caused when adding in the spark-streaming-kinesis-asl dependency. How do I get around this? I can mark the dependency as provided but then add the jar into the classpath but that's really not something I want to do.

Adam Ritter
  • 989
  • 1
  • 9
  • 19

1 Answers1

0

I don't know the accurate solution in your case but you have to play with merge strategy for these dependencies. It could be something like that:

lazy val strategy = assemblyMergeStrategy in assembly <<= (assemblyMergeStrategy in assembly) { (old) => {
  case "application.conf" => MergeStrategy.concat
  case meta(_) => MergeStrategy.discard
  case x => MergeStrategy.first
}
}
lazy val core = (project in file("core")).
  settings(
    name := "core",
    libraryDependencies ++= Seq(
     ...
    ),
    strategy
  )

P.S. I recommend also assembly all dependencies in one "fatjar" project and provide code in dependent project. Then you can put fatjar on hdfs and package your actual code with sbt package. To run your code you should provide fatjar and your package with --jar option.

Nikita
  • 4,435
  • 3
  • 24
  • 44