How to build shaded jars for a Spark application for distributed and local execution?

Question

I'm building a Spark application and I want to generate two separate shaded .jar files, one for each of these contexts:

For master=local mode, I want a single .jar file that can be executed with java -jar shaded-for-local-mode.jar. This should include all of the dependencies, including the Spark and Hadoop dependencies in use.
For distributed mode, I want a single .jar file that excludes the Spark and Hadoop libraries (org.apache.hadoop:*) so that they do not conflict with the runtime environment provided by spark-submit (spark docs), but I also want to include a dependency that is within the org.apache.hadoop group (org.apache.hadoop:hadoop-aws) because it isn't provided by the runtime environment.

This answer explained how to create two jars by using two separate <execution> blocks, but I'm having trouble getting the excludes to work as I want them.

Here's the relevant shade <execution>:

<execution>
  <id>shade-spark-submit</id>
  <phase>package</phase>
  <goals>
    <goal>shade</goal>
  </goals>
  <configuration>
    <shadedClassifierName>shaded-spark-submit</shadedClassifierName>
    <artifactSet>
      <excludes>
        <exclude>org.apache.spark:*</exclude>
        <!-- We want hadoop-aws, an explicit dependency of this project, but not any of the other hadoop packages. -->
        <exclude>org.apache.hadoop:hadoop-auth</exclude>
        <exclude>org.apache.hadoop:hadoop-common</exclude>
        <exclude>org.apache.hadoop:hadoop-annotations</exclude>
      </excludes>
    </artifactSet>
    <finalName>${project.artifactId}-${project.version}-shade-spark-submit</finalName>
  </configuration>
</execution>

My concern is that the <exclude> statements are explicitly excluding specific packages that I know about, which is slightly different than my goal: I want to always exclude org.apache.hadoop:* (even ones I don't happen to know about) but include org.apache.hadoop:hadoop-aws. The documentation for shade does not fully describe how <include> and <exclude> tags are processed.

Thanks!

How to build shaded jars for a Spark application for distributed and local execution?

0 Answers0