11

What is the precedence in class loading when both the uber jar of my spark application and the contents of --jars option to my spark-submit shell command contain similar dependencies ?

I ask this from a third-party library integration standpoint. If I set --jars to use a third-party library at version 2.0 and the uber jar coming into this spark-submit script was assembled using version 2.1, which class is loaded at runtime ?

At present, I think of keeping my dependencies on hdfs, and add them to the --jars option on spark-submit, while hoping via some end-user documentation to ask users to set the scope of this third-party library to be 'provided' in their spark application's maven pom file.

Sudarshan Thitte
  • 83
  • 2
  • 4
  • 12

1 Answers1

16

This is somewhat controlled with params:

  • spark.driver.userClassPathFirst &
  • spark.executor.userClassPathFirst

If set to true (default is false), from docs:

(Experimental) Whether to give user-added jars precedence over Spark's own jars when loading classes in the the driver. This feature can be used to mitigate conflicts between Spark's dependencies and user dependencies. It is currently an experimental feature. This is used in cluster mode only.

I wrote some of the code that controls this, and there were a few bugs in the early releases, but if you're using a recent Spark release it should work (although it is still an experimental feature).

panther
  • 767
  • 5
  • 21
Holden
  • 7,392
  • 1
  • 27
  • 33
  • thanks @holden - I'll look into this. Supposing my binary cannot have a conflict with what Spark ships with since Spark doesn't know about it, will this still be the parameter to use ? [ex. if my binary is 3rd party and not one of yarn or other open source binaries that Spark ships with] – Sudarshan Thitte Jul 08 '15 at 21:00
  • 5
    Please note that ```The configuration key 'spark.yarn.user.classpath.first' has been deprecated as of Spark 1.3 and may be removed in the future. Please use spark.{driver,executor}.userClassPathFirst instead.``` – placeybordeaux May 16 '16 at 23:37
  • 1
    @Holden Why are the `userClassPathFirst` options marked as "experimental"? Could the option be withdrawn? Is there a drawback to using this option? I seem to quite easily get into situation where I need a newer version of `gson` or `snakeyaml` than is shipped with Spark. It seems an important option to me. – Frank Wilson Sep 28 '18 at 10:03
  • 1
    This did not help me solve the problem on Spark version 2.3.0, I am getting the same error – Abhishek Gupta Mar 12 '19 at 13:49
  • Also the expiremental feature is more that it's behaviour may change (for example in the current master branch for new JVMs the feature is implemented differently). – Holden Mar 20 '19 at 10:59
  • If I provide jars in --jars option where jars have same class name which class would be picked while execution, spark version 2.4. – Omkar Rahane Feb 08 '22 at 17:51