9

We use spark a lot for our scala applications. If I'm testing locally my library dependencies are:

  libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.6.1",
 libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.6.1" ,

whereas is I'm building a jar to deploy I use:

  libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.6.1" % "provided",
libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.6.1" % "provided",

Due to the nature of the work we may sometimes have to flip back and forth a few times while trying different things. It's inevitable that at some point I forget to change the build file and end up wasting time, its not a lot of time but enough to prompt me into asking this question.

So, is anyone aware of a way (excluding remembering to 'do it right') of having the build file update the provided value depending on a trigger? Perhaps a configuration option that reads test or live for example?

Thanks in advance.

null
  • 3,469
  • 7
  • 41
  • 90

1 Answers1

9

I have just performed the dynamic build with two different spark version in my example. I need to use two different version based on specific condition.

You can do that in two ways. As you need to provide input in one or other way, so you need to use command line parameters.

1) using build.sbt it self.

a) you can define a parameter with the name "sparkVersion"

b) read that parameter in build.sbt, (you can write scala code in build.sbt, and it gets compiled to scala any way in build time.)

c) perform the conditional based based dependencies as below.

val sparkVersion = Option(System.getProperty("sparkVersion")).getOrElse("default")

if(sparkVersion == "newer"){
    println(" newer one");
    libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0" 
}else{
    println(" default one");
    libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0" % "provided"
}

you can play with all the build options at your will.

2) Using build.scala file. You can create a build.scala file in /project/build.scala

you can write the code below.

import sbt._
import Keys._

object MyBuild extends Build {  
  val myOptionValue = Option(System.getProperty("scalaTestVersion")).getOrElse("defaultValue")

  val depVersion = if(myOptionValue == "newer"){
    println(" asked for newer version" );
    "2.2.6"
  }else{
    println(" asked for older/default version" );
    "2.2.0"
  }

   val dependencies = Seq(
    "org.scalatest" %% "scalatest" % depVersion % "test"
  )

   lazy val exampleProject = Project("SbtExample", file(".")).settings(
    version       := "1.2",
    scalaVersion  := "2.10.4",
    libraryDependencies ++= dependencies
  )

}

After this, just run the build command as below.

sbt clean compile -DsparkVersion=newer -DscalaTestVersion=newer

I have given build command for both. You can choose either one and give only one option. Please write to me, if you need any help.

For resolving duplicates in build you can add below one in build.sbt

mergeStrategy in assembly := {
  case m if m.toLowerCase.endsWith("manifest.mf")          => MergeStrategy.discard
  case m if m.toLowerCase.matches("meta-inf.*\\.sf$")      => MergeStrategy.discard
  case "log4j.properties"                                  => MergeStrategy.discard
  case "log4j-defaults.properties"                         => MergeStrategy.discard
  case m if m.toLowerCase.startsWith("meta-inf/services/") => MergeStrategy.filterDistinctLines
  case "reference.conf"                                    => MergeStrategy.concat
  case _                                                   => MergeStrategy.first
}

You will understand how good and magical sbt is with this.

Srini
  • 3,334
  • 6
  • 29
  • 64
  • That's fantastic! Thank you so much. I'll play with tomorrow but it looks like I'm going to be rewriting the entire build file. :D – null Jun 28 '16 at 20:20
  • Hi Srini, I've just tried to implement your solution but have run into difficulties. The command I'm running usually to build the jar is 'sbt assembly' with the spark files marked as provided. using your solution I received a deduplicate error, I've tried variuos permutations using -D variables but it either tells me it's not defined or errors. – null Jun 29 '16 at 15:25
  • Ok, Then you need to add the logic to resolve the duplicates. You can simply add the edit provided in answer. – Srini Jun 29 '16 at 15:27
  • Hi Srini, it hasn't, no. It was similar to my merge strategy but your suggest was more thorough but still failed. :( My build logic is: //live or test dependencies if (liveOrTest == "live") { println("Setting Live Values") libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.6.1" % "provided" libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.6.1" % "provided" } else { libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.6.1" libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.6.1" }, – null Jun 29 '16 at 15:43
  • Hi Srini, apologies for the late reply, it went crazy :) If I run sbt assembly -DliveOrTest=live; The error is: [error] deduplicate: different file contents found in the following: [error] C:\Users\Steve\.ivy2\cache\org.apache.hadoop\hadoop-yarn-common\jars\hadoop-yarn-common-2.2.0.jar:org/apache/hadoop/yarn/util/package-info.class along with many other deduplicate error. – null Jun 30 '16 at 08:28
  • If I just run sbt compile -DliveOrTest=live then I get the dedupe errors along with [error] Not a valid command: live (similar: alias) [error] Not a valid project ID: live [error] Expected ':' (if selecting a configuration) [error] Not a valid key: live (similar: deliver, offline, licenses) [error] live [error] ^ [error] Not a valid command: DliveOrTest [error] Not a valid project ID: DliveOrTest [error] Expected ':' (if selecting a configuration) [error] Not a valid key: DliveOrTest (similar: deliver) [error] DliveOrTest [error] ^ – null Jun 30 '16 at 08:29
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/116093/discussion-between-srini-and-null). – Srini Jun 30 '16 at 12:02