Apache Spark dependency issue

Question

I'm trying to run my spark application in a Hadoop cluster. The spark version running in the cluster is 1.3.1. I'm getting the error as posted below while packaging and running my spark application in a cluster. I looked at the other posts as well, seems like I'm messing up with the library dependencies, but couldn't figure out what!

Here are the other information that might be helpful for you guys to help me out:

hadoop -version:

Hadoop 2.7.1.2.3.0.0-2557
Subversion git@github.com:hortonworks/hadoop.git -r          9f17d40a0f2046d217b2bff90ad6e2fc7e41f5e1
Compiled by jenkins on 2015-07-14T13:08Z
Compiled with protoc 2.5.0
From source with checksum 54f9bbb4492f92975e84e390599b881d
This command was run using /usr/hdp/2.3.0.0-2557/hadoop/lib/hadoop-common-2.7.1.2.3.0.0-2557.jar

The error stack:

java.lang.NoSuchMethodError: org.apache.spark.sql.hive.HiveContext: method <init>(Lorg/apache/spark/api/java/JavaSparkContext;)V not found
at com.cyber.app.cyberspark_app.main.Main.main(Main.java:163)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:577)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:174)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

My pom.xml looks like this:

<build>
    <plugins>
        <plugin>
            <artifactId>maven-assembly-plugin</artifactId>
            <configuration>
                <archive>
                    <manifest>
                        <mainClass>path.to.my.main.Main</mainClass>
                    </manifest>
                </archive>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
            </configuration>
            <executions>
                <execution>
                    <id>make-assembly</id> <!-- this is used for inheritance merges -->
                    <phase>package</phase> <!-- bind to the packaging phase -->
                    <goals>
                        <goal>single</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>
<dependencies>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>3.8.1</version>
        <scope>test</scope>
    </dependency>
    <dependency> <!-- Spark dependency -->
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>1.3.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>1.6.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_2.11</artifactId>
        <version>1.6.1</version>
        <scope>provided</scope>
    </dependency>
</dependencies>

I'm using "mvn package" to package my jar.

EDIT:

I tried changing all the versions to 1.3.1. If I do this change, I need to change my application as I'm using the features that were available after 1.3.1.
But if I put all 1.6.1 compiled under Scala_2.10, I get the same error.

Please let me know if I need to provide any additional information. Any help will be greatly appreciated.

Thank you.

marios · Accepted Answer · 2016-04-28T02:58:30.330

1

It can be binary compatibility issues.

First, make sure that all your Spark dependencies are on Spark 1.3.1. I see that you have Spark SQL to be on 1.6.1.

Second, you are using Spark compiled on Scala 2.11. The typical distribution of Spark is compiled only on 2.10. Typically, if you want the 2.11 version you need to compile spark yourself.

If you are not sure if the Spark running on your cluster is compiled with Scala I would change all my dependencies to use "2.10" instead of "2.11" and try again.

edited Apr 28 '16 at 02:58

answered Apr 28 '16 at 01:06

marios

8,874
3
38
62

Actually, I was building all the dependencies using Spark compiled on Scala 2.10 previously. Then I tried to change the versions to see if I find the compatible ones. But yeah, it wasn't working for Spark compiled on Scala 2.10 as well. Any other suggestions? – accssharma Apr 28 '16 at 01:48
I updated the answer let me know if that works for you. – marios Apr 28 '16 at 02:58
If I do changes to Spark 1.3.1 then I have issues in the application as I've been using the features of Spark after Spark 1.3.1. I've added Edit section on my question as well. – accssharma Apr 28 '16 at 16:01
Ok, I see the issue now. Unfortunately you cannot do that. You need to be compiling the same version as your running instance. Anything else will cause issues similar to what you have. Your best bet is to push your engineering team to upgrade Spark. – marios Apr 28 '16 at 17:01
We have all the Hadoop services bundled with the Hortonworks, and we've HDP 2.3 which only support 1.3.1 max (or 1.4.1, I need to talk to our IT Engineer). With both of these older versions, I won't be able to leverage all the features that I've been using, so seems like we need to upgrade the whole HDP bundles to the latest. Thanks @marios for helping out. Really appreciate the help. :-) – accssharma Apr 28 '16 at 19:47
Hey @marios , I've posted another question related to spark application deployment. Can you help me figureout this. [Question](http://stackoverflow.com/questions/36945467/various-apache-sparks-deployment-problems). Thanks. – accssharma Apr 29 '16 at 18:48

Apache Spark dependency issue

1 Answers1