2

I am developing a spark application which is using xgboost4j. https://github.com/dmlc/xgboost/tree/master/jvm-packages

This package requires to be compiled to the local architecture due to local C dependencies of the jar. But the cluster has a different architecture than the development laptop. How can I substitute the package when running sbt assembly via one from the cluster? Or would you suggest to solve this via a % "provided" ?

Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
  • Can you provide different classifiers or artifact IDs for artifacts from target environment and development environment? If yes, then you can use system property to switch between artifactIDs (as mentioned in other question - I don't want to post answer before confirmation ;) ) – T. Gawęda Nov 24 '16 at 09:50
  • you mean stages? I think this should be possible. – Georg Heiler Nov 24 '16 at 10:04

1 Answers1

5

Use suffix for (provided/compile) libs as like:

val suffix = Option(System getProperty "provided").isDefined match {
    case true  => "provided"
    case false => "compile"
  }

libraryDependencies += "org.apache.spark" %% "spark-sql" % Spark.version % suffix

and run sbt -Dprovided assembly if you need all jars in your uberjar

FaigB
  • 2,271
  • 1
  • 13
  • 22
  • but this will compile in local architecture? is is compilation executed on the fly ? – Georg Heiler Nov 24 '16 at 10:05
  • @GeorgHeiler It will use previously compiled dependency. That was my question - if you can provide two different JARs before project build, then this answer is correct (and my suggested before, but FaigB was first, so I will not post duplicate). If you MUST compile module on the fly (during compilation of your application), I think this answer is not correct – T. Gawęda Nov 24 '16 at 10:08
  • so I should build the xgboost4j jar once local wiht name xgboost-local and once for cluster xgboost-cluster? – Georg Heiler Nov 24 '16 at 10:11
  • Two options: you can do as you wrote or you can build with the same artifact ID, but in production you will use provided dependency, added by --jar option to Spark-Submit. Version in local repository will be for developer machines, but jar on the cluster will be compiled only for cluster and not included to your application's uber jar. Both options are ok. Problem will be only if you want to compile xgboost each time you build your module – T. Gawęda Nov 24 '16 at 10:34