0

I am packaging a Spark project into a only one jar file but when I compile with maven it includes a lot jars (100mb, too much size!!) but not a org/log4j/* dependencies (it generates error in execution time), however it adds others as jboss/netty/* into the jar.

I suppose all dependencies are included with each other, I must assume those 100 mb ¿?, but it does not include a dependencies org/log4j/*

¿Are there any way to include only the 10 jars that I specified into my maven xml file? :

 <dependencies>
      <dependency>
          <groupId>junit</groupId>
          <artifactId>junit</artifactId>
          <version>4.12</version>
          <scope>test</scope>
      </dependency>
      <dependency>
          <groupId>log4j</groupId>
          <artifactId>log4j</artifactId>
          <version>1.2.17</version>
    </dependency>
      <dependency>
          <groupId>org.apache.commons</groupId>
          <artifactId>commons-io</artifactId>
          <version>1.3.2</version>
      </dependency>
      <dependency>
          <groupId>org.apache.commons</groupId>
          <artifactId>commons-lang3</artifactId>
          <version>3.5</version>
    </dependency>
    <dependency>
      <groupId>commons-codec</groupId>
      <artifactId>commons-codec</artifactId>
      <version>1.9</version>
    </dependency>
      <dependency>
          <groupId>com.google.code.gson</groupId>
          <artifactId>gson</artifactId>
          <version>2.6.2</version>
      </dependency>
      <dependency>
          <groupId>org.json</groupId>
          <artifactId>json</artifactId>
          <version>20170516</version>
      </dependency>
      <dependency>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-common</artifactId>
          <version>2.8.1</version>
      </dependency>
      <dependency>
          <groupId>org.apache.spark</groupId>
          <artifactId>spark-sql_2.10</artifactId>
          <version>${spark.version}</version>
      </dependency>
      <dependency>
          <groupId>org.apache.spark</groupId>
          <artifactId>spark-core_2.10</artifactId>
          <version>${spark.version}</version>
      </dependency>
      <dependency>
          <groupId>com.databricks</groupId>
          <artifactId>spark-csv_2.10</artifactId>
          <version>${spark.version}</version>
      </dependency>
  </dependencies>

to create a 'jar-with-dependencies' I use this plugins:

   <plugin>
                <artifactId>maven-dependency-plugin</artifactId>
                <executions>
                    <execution>
                        <phase>process-sources</phase>
                        <goals>
                            <goal>copy-dependencies</goal>
                        </goals>
                        <configuration>
                            <outputDirectory>${targetdirectory}</outputDirectory>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.6</version>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

Thanks

MrElephant
  • 302
  • 4
  • 26

3 Answers3

1

First off, a lot of the libraries that we pull for dependencies will call other libraries they they themselves depend on. Sometimes this can get pretty effin nuts with all of the things conflicting with each other.

use dependency tree to see what conflicts and which library brings in what.

mvn dependency:tree -Dverbose

The verbose gives you more info. If there you're looking for conflicts, go with:

mvn dependency:tree -Dverbose | grep 'omitted for conflict'

Once you find the stuff you want to exclude, check Dependency Exclusions:

[Goes inside dependencies tag]

<exclusions>
    <exclusion>  <!-- declare the exclusion here -->
      <groupId>sample.ProjectB</groupId>
      <artifactId>Project-B</artifactId>
    </exclusion>
  </exclusions> 
Old Schooled
  • 1,222
  • 11
  • 22
  • then I must assume those 100 mb for my release.jar . My compilation with Dverbose shows '[INFO] +- log4j:log4j:jar:1.2.17:compile` without any conflict – MrElephant Jul 28 '17 at 12:16
  • Sounds like it to me. If there are no conflicts then there isn't any superfluous library that you can exclude without likely screwing up your program. All the extra stuff being imported are libraries than your main dependencies are depending on. – Old Schooled Jul 28 '17 at 12:33
  • But there are other libraries that shows '+- (log4j:log4j:jar:1.2.16:compile - omitted for conflict with 1.2.17)' I understand that in case of conflict omitts the version 1.2.16, but version 1.2.17 should be included – MrElephant Jul 28 '17 at 12:38
  • 1.2.17 is included. It says, "1.2.16.... omitted for conflict" so that was omitted, the other one stayed. – Old Schooled Jul 28 '17 at 12:42
  • 2
    Maven usually packs in more libraries than you need - but it is hard to figure out if you _really_ don't need a given library and Maven offers no help with that. – J Fabian Meier Jul 28 '17 at 14:59
0

There is a dependency tree, i.e. all your dependencies are pulling in their own dependencies and so on.

To show your tree use

mvn dependency:tree

See https://maven.apache.org/plugins/maven-dependency-plugin/tree-mojo.html

Essex Boy
  • 7,565
  • 2
  • 21
  • 24
0

If you really only want to include the dependencies you explicitly listed in your pom, you can use the concept of *-exclusions for all dependencies. These are explained in

https://stackoverflow.com/a/7556707/927493

Be aware that your program might not compile anymore. Even more likely are "ClassNotFound" exceptions at runtime because one of your dependencies tries to call a method from one of its dependencies but you excluded all transitive dependencies.

J Fabian Meier
  • 33,516
  • 10
  • 64
  • 142