How to override spark jars while running spark-submit command in cluster mode? (okhttp3)

Question

There is a conflict of jar in my project and jar in spark-2.4.0 jars folder. My Retrofit brings okhttp-3.13.1.jar (verified in mvn dependency:tree), but spark in server has okhttp-3.8.1.jar, and I get NoSuchMethodException. So, I'm trying to give my jar explicitly to override it.

When I try running spark-submit command in client mode, it picks up the explicit jar that I have provided. But when I try running the same in cluster mode, this fails to override the jar at the worker nodes, and executors use the same old jar of Spark which leads to NoSuchMethodError.
My jar is a fat jar but spark jar somehow takes precedence over the same. If I can delete the jars provided by Spark, it would probably work, but I can't as other services may be using it.

Following is my command:

./spark-submit --class com.myJob \
  --conf spark.yarn.appMasterEnv.ENV=uat \
  --conf spark.driver.memory=12g \
  --conf spark.executor.memory=40g \
  --conf spark.sql.warehouse.dir=/user/myuser/spark-warehouse \
  --conf "spark.driver.extraClassPath=/home/test/okhttp-3.13.1.jar" \
  --conf "spark.executor.extraClassPath=/home/test/okhttp-3.13.1.jar" \
  --jars /home/test/okhttp-3.13.1.jar \
  --conf spark.submit.deployMode=cluster \
  --conf spark.yarn.archive=hdfs://namenode/frameworks/spark/spark-2.4.0-archives/spark-2.4.0-archive.zip \
  --conf spark.master=yarn \
  --conf spark.executor.cores=4 \
  --queue public \
  file:///home/mytest/myjar-SNAPSHOT.jar

final Retrofit retrofit = new Retrofit.Builder()
                            .baseUrl(configuration.ApiUrl()) // this throws nosuchmethodexception
                            .addConverterFactory(JacksonConverterFactory.create(new ObjectMapper()))
                            .build();

My mvn dependency:tree doesn't indicate any other transitive jars in my jar. And it runs fine in local in IntelliJ as well as with mvn clean install.

I even tried providing HDFS path of jars (hdfs://users/myuser/myjars/okhttp-3.13.1.jar) with no luck. Can someone shed some light?

I get the following exception if I try both --conf "spark.driver.userClassPathFirst=true" --conf "spark.executor.userClassPathFirst=true"

Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.<init>(YarnSparkHadoopUtil.scala:48)
    at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.<clinit>(YarnSparkHadoopUtil.scala)
    at org.apache.spark.deploy.yarn.Client$$anonfun$1.apply$mcJ$sp(Client.scala:81)
    at org.apache.spark.deploy.yarn.Client$$anonfun$1.apply(Client.scala:81)
    at org.apache.spark.deploy.yarn.Client$$anonfun$1.apply(Client.scala:81)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.deploy.yarn.Client.<init>(Client.scala:80)
    at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1526)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassCastException: org.apache.hadoop.yarn.api.records.impl.pb.PriorityPBImpl cannot be cast to org.apache.hadoop.yarn.api.records.Priority
    at org.apache.hadoop.yarn.api.records.Priority.newInstance(Priority.java:39)
    at org.apache.hadoop.yarn.api.records.Priority.<clinit>(Priority.java:34)
    ... 15 more

But if I have only --conf "spark.executor.userClassPathFirst=true", then it hangs

Have you tried setting `spark.driver.userClassPathFirst=true` / `spark.executor.userClassPathFirst=true`? Just beware of potentially disastrous side-effects :) — mazaneicha, Apr 10 '20 at 20:35
I tried the mentioned suggestions, and I get another exception. added to the description now.. please check — Saawan, Apr 11 '20 at 08:37
Thats what disastrous side-effects look like, especially when you built a fat jar with dependency versions different from target environment. — mazaneicha, Apr 11 '20 at 13:42
why does jar override work fine in case of client mode even without userClassPathFirst=true but it doesnt work in cluster mode ? — Saawan, Apr 11 '20 at 15:00
@Saawan Is the jar placed in below 2 hdfs locations? /home/test/okhttp-3.13.1.jar /home/svc_mars_mds/test/okhttp-3.13.1.jar Why this spark-2.4.0-archive.zip zip is being passed won't the dependent jars be use from cluster nodes itself? — yammanuruarun, Apr 11 '20 at 18:42
Why this spark-2.4.0-archive.zip zip is being passed won't the dependent jars be use from cluster nodes itself? Could you please elaborate ? Are you suggesting this is not required ? — Saawan, Apr 12 '20 at 08:04

score 2 · Answer 1 · edited May 31 '23 at 03:38

I have solved the issue using maven shade plugin.

Ignore Spark Cluster Own Jars

Reference video:

https://youtu.be/WyfHUNnMutg?t=23m1s

I followed answer given here and added the following. Even in the source code for SparkSubmit, you will see jar getting appended to total jar list if we give --jar, so it will never override with those options but it will add jar.

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L644

    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-shade-plugin</artifactId>
      <executions>
        <execution>
          <phase>package</phase>
          <goals>
            <goal>shade</goal>
          </goals>
          <configuration>
            <relocations>
              <relocation>
                <pattern>okio</pattern>
                <shadedPattern>com.shaded.okio</shadedPattern>
              </relocation>
              <relocation>
                <pattern>okhttp3</pattern>
                <shadedPattern>com.shaded.okhttp3</shadedPattern>
              </relocation>
            </relocations>
            <filters>
              <filter>
                <artifact>*:*</artifact>
                <excludes>
                  <exclude>META-INF/*.SF</exclude>
                  <exclude>META-INF/*.DSA</exclude>
                  <exclude>META-INF/*.RSA</exclude>
                  <exclude>log4j.properties</exclude>
                </excludes>
              </filter>
            </filters>
          </configuration>
        </execution>
      </executions>
    </plugin>

How to override spark jars while running spark-submit command in cluster mode? (okhttp3)

1 Answers1