0

I had a maven project that want to use es-spark to read from elasticsearch, my pom.xml is like:

  <groupId>com.jzdata.logv</groupId>
  <artifactId>es-spark</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>es-spark</name>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

  <dependencies>
    <dependency>
        <groupId>org.elasticsearch</groupId>
        <artifactId>elasticsearch-spark_2.11</artifactId>
        <version>2.1.2</version>
    </dependency>
  </dependencies>

   <build>
    <plugins>
     <plugin>  
       <groupId>org.apache.maven.plugins</groupId>  
       <artifactId>maven-compiler-plugin</artifactId>
       <version>3.1</version>  
       <configuration>  
         <source>1.7</source>  
         <target>1.7</target>  
       </configuration>  
     </plugin>
     <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-jar-plugin</artifactId>
        <version>2.6</version>
        <configuration>
          <archive>
            <manifest>
              <addClasspath>true</addClasspath>
              <classpathPrefix>lib/</classpathPrefix>
              <mainClass>my.main.class</mainClass>
            </manifest>
           </archive>
        </configuration>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-dependency-plugin</artifactId>
        <version>2.10</version>
        <executions>
          <execution>
            <id>copy-dependencies</id>
            <phase>package</phase>
            <goals>
              <goal>copy-dependencies</goal>
            </goals>
            <configuration>
              <outputDirectory>${project.build.directory}/lib</outputDirectory>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>  
  </build>  

My dependency is according to elasticsearch-spark install.

I want to package a runnable JAR file with dependency jar files.

where I run cmd mvn package, it shows did not find package org.apache.spark,org.apache.spark.api.java, but these package is in my maven dependencies.

Where is my wrong step?

fmyblack
  • 83
  • 2
  • 10

1 Answers1

0

The library is intended for use in Spark applications, it assumes the Spark dependencies will be available whenever it is loaded.

*Similarly, YOU are expecting Spark dependencies to be available when your application is run: RDDs/DataFrames/SparkContext are all part of Spark. (*see my comment below)

The problem is that you haven't indicated this to the compiler, it thinks you are using libraries that will not be available during execution. Think of it this way - The build fails because the compiler doesn't think your application will work.

To fix the problem you must tell the compiler that you expect to have Spark libraries available during execution:

<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>1.6.0</version> <scope>provided</scope> </dependency>

*Important* You need to exclude Spark libraries from your artifact, otherwise you may end up with more than one version of Spark in your classpath (their is no reason to include them anyway, Spark is loading your application!). Setting the scope to provided tells the compiler you are expecting Spark to be available and it should be excluded from the output.

https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#Dependency_Scope

AssHat_
  • 353
  • 2
  • 13
  • The elasticsearch-spark_2.1 library has this same dependency and scope (Spark version may be different). The side effect of adding elasticsearch-spark_2.1 as a dependency is that Spark becomes a (transitive) dependency. A transitive dependency with Provided Scope isn't common - Sparks framework is one of the rare cases it makes sense. – AssHat_ Feb 21 '16 at 10:06