Excluding hadoop dependency from spark library in sbt file

Question

I am working on spark 1.3.0 . My build.sbt looks as follows:

libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.3.0" % "provided",
  "org.apache.spark" %% "spark-sql" % "1.3.0" % "provided",
  "org.apache.spark" %% "spark-streaming" % "1.3.0" % "provided",
  "org.apache.spark" %% "spark-mllib" % "1.3.0" % "provided",
  "org.springframework.security" % "spring-security-web" % "3.0.7.RELEASE",
  "com.databricks" % "spark-csv_2.10" % "1.4.0"
)

// META-INF discarding
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
   {
    case PathList("META-INF", xs @ _*) => MergeStrategy.discard
    case x => MergeStrategy.first
   }
}

With this sbt file hadoop 2.2.0 is being used during compilation. But my run environment contains hadoop 2.6.0 . Can anyone help how can i exclude hadoop dependency from spark library and mention hadoop 2.6.0 in sbt file?

Thanks

score 0 · Answer 1 · answered Aug 17 '16 at 12:58

I don't think the Spark packages bring in Hadoop dependencies. Your build will not contain Hadoop client libraries. You will have to run your application with spark-submit from a Spark installation. When you download Spark make sure you download a Spark build that supports Hadoop 2.6.

Spark 1.3.0 (2015-03-15) does not have a Hadoop 2.6 build. The earliest Spark version to provide a Hadoop 2.6 build is Spark 1.3.1 (2015-04-17).

These are both ancient versions of Spark with a lot of known bugs that have been fixed since then. Unless you like bugs, I suggest using Spark 1.6.2 or 2.0.0.

Excluding hadoop dependency from spark library in sbt file

1 Answers1