1

I am currently working on a project in Spark 2.1.0 and I need to import a library on which Spark itself already depends. In particular, I want org.roaringbitmap:RoaringBitmap:0.7.42 to replace org.roaringbitmap:RoaringBitmap:0.5.11 (the library on which both org.apache.spark:spark-core_2.11:2.1.0.cloudera1 and org.apache.spark:spark-sql_2.11:2.1.0.cloudera1 depend on).

My dependencies in build.gradle are the following

dependencies {
    compile 'org.apache.spark:spark-core_2.11:2.1.0.cloudera1'
    runtime ('org.apache.spark:spark-core_2.11:2.1.0.cloudera1') {
        exclude group: 'org.roaringbitmap'
    }
    compile 'org.apache.spark:spark-sql_2.11:2.1.0.cloudera1'
    runtime ('org.apache.spark:spark-sql_2.11:2.1.0.cloudera1') {
        exclude group: 'org.roaringbitmap'
    }
    compile 'org.roaringbitmap:RoaringBitmap:0.7.42'
    implementation 'org.roaringbitmap:RoaringBitmap'
    constraints {
        implementation('org.roaringbitmap:RoaringBitmap:0.7.42') {
            because 'because of transitive dependency'
        }
    }
}

The output of gradle -q dependencyInsight --dependency org.roaringbitmap shows that dependency has been updated

org.roaringbitmap:RoaringBitmap -> 0.7.42
   variant "default+runtime" [
      org.gradle.status = release (not requested)
      Requested attributes not found in the selected variant:
         org.gradle.usage  = java-api
   ]
\--- compileClasspath

org.roaringbitmap:RoaringBitmap:0.5.11 -> 0.7.42
   variant "default+runtime" [
      org.gradle.status = release (not requested)
      Requested attributes not found in the selected variant:
         org.gradle.usage  = java-api
   ]
\--- org.apache.spark:spark-core_2.11:2.1.0.cloudera1
     +--- compileClasspath
     +--- org.apache.spark:spark-sql_2.11:2.1.0.cloudera1
     |    \--- compileClasspath
     \--- org.apache.spark:spark-catalyst_2.11:2.1.0.cloudera1
          \--- org.apache.spark:spark-sql_2.11:2.1.0.cloudera1 (*)

Unfortunately, when I run my application with spark2-submit the actual version of the runtime dependency is org.roaringbitmap:RoaringBitmap:0.5.11.

How can I force my application to use the desired version of RoaringBitmap?

w4bo
  • 855
  • 7
  • 14
  • I can not answer the specifics, but are you sure that "forcing" the version is the way to go? It will work only if 5.11 and 7.42 are binary compatible, which is to say, the classes and method names, arguments, and declared exceptions of each and every reachable code are strictly the same. If Spark calls RoaringBitmap's `A#whatever()` in 5.11 and this method has changed (or was deleted) in 7.42 it **will** crash. Your best bet would be to rework your app using Spark's version, or to use [shading](https://softwareengineering.stackexchange.com/questions/297276/what-is-a-shaded-java-dependency). – GPI Apr 05 '19 at 13:58

2 Answers2

1

I believe CDH provided libraries takes precedence over your libraries anyway.

You could check this using the next piece of code in spark2-shell:

import java.lang.ClassLoader
val cl = ClassLoader.getSystemClassLoader
cl.asInstanceOf[java.net.URLClassLoader].getURLs.foreach(println)

Generally i use shade plugin to overcome it.

Mikita Harbacheuski
  • 2,193
  • 8
  • 16
0

Spark has an option to prioritize the user class path over its own. Classpath resolution between spark uber jar and spark-submit --jars when similar classes exist in both

Most likely you should also look into shading.

Georg Heiler
  • 16,916
  • 36
  • 162
  • 292