1

Hive version: 2.0.0
Spark 2.3.0
Yarn as the scheduler.

It's not compatible out of the box but I've had to set the below configs to make it compatible.

spark.sql.hive.metastore.version 2.0.0
spark.sql.hive.metastore.jars /usr/local/apache-hive-2.0.0-bin/lib/*

I am able to successfully run hive queries on the spark cluster using spark-sql. However, when I run a query using the hive cli, I face the below error (as seen in hive logs):

2021-10-17T03:06:53,727 INFO  [1ff8e619-80bb-46ea-9fd0-824d57ea3799 1ff8e619-80bb-46ea-9fd0-824d57ea3799 main]: client.SparkClientImpl (SparkClientImpl.java:startDriver(428)) - Running client driver with argv: /usr/local/spark/bin
/spark-submit --properties-file /tmp/spark-submit.255205804744246105.properties --class org.apache.hive.spark.client.RemoteDriver /usr/local/apache-hive-2.0.0-bin/lib/hive-exec-2.0.0.jar --remote-host <masked_hostname> --remote-port 34537 --conf hive.spark.client.connect.timeout=1000 --conf hive.spark.client.server.connect.timeout=90000 --conf hive.spark.client.channel.log.level=null --conf hive.spark.client.rpc.max.size=52428800 --conf hive.spark.client.rpc.threads=8 --conf hive.spark.client.secret.bits=256
2021-10-17T03:06:54,488 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) - Warning: Ignoring non-spark config property: hive.spark.client.server.connect.timeout=90000
2021-10-17T03:06:54,489 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) - Warning: Ignoring non-spark config property: hive.spark.client.rpc.threads=8
2021-10-17T03:06:54,489 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) - Warning: Ignoring non-spark config property: hive.spark.client.connect.timeout=1000
2021-10-17T03:06:54,489 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) - Warning: Ignoring non-spark config property: hive.spark.client.secret.bits=256
2021-10-17T03:06:54,489 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) - Warning: Ignoring non-spark config property: hive.spark.client.rpc.max.size=52428800
2021-10-17T03:06:55,000 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) - java.lang.NoClassDefFoundError: org/apache/spark/JavaSparkListener
2021-10-17T03:06:55,000 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) -        at java.lang.ClassLoader.defineClass1(Native Method)
2021-10-17T03:06:55,000 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) -        at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
2021-10-17T03:06:55,000 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) -        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
2021-10-17T03:06:55,000 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) -        at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
2021-10-17T03:06:55,000 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) -        at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
2021-10-17T03:06:55,001 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) -        at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
2021-10-17T03:06:55,001 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) -        at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
2021-10-17T03:06:55,001 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) -        at java.security.AccessController.doPrivileged(Native Method)
2021-10-17T03:06:55,001 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) -        at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
2021-10-17T03:06:55,001 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) -        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
2021-10-17T03:06:55,001 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) -        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
2021-10-17T03:06:55,001 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) -        at java.lang.Class.forName0(Native Method)
2021-10-17T03:06:55,001 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) -        at java.lang.Class.forName(Class.java:348)
2021-10-17T03:06:55,001 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) -        at org.apache.spark.util.Utils$.classForName(Utils.scala:235)
2021-10-17T03:06:55,002 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) -        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:836)
2021-10-17T03:06:55,002 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) -        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
2021-10-17T03:06:55,002 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) -        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
2021-10-17T03:06:55,002 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) -        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
2021-10-17T03:06:55,002 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) -        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2021-10-17T03:06:55,003 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) - Caused by: java.lang.ClassNotFoundException: org.apache.spark.JavaSparkListener
2021-10-17T03:06:55,003 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) -        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
2021-10-17T03:06:55,003 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) -        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
2021-10-17T03:06:55,003 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) -        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)

I have also added spark libraries to the hive classpath using Spark as execution engine with Hive

Any suggestions how to fix the above error?

Dhanush D
  • 75
  • 12
  • It would appear you've not correctly added the Spark jars to the Hive classpath. Can you show what files you did add and to where? – OneCricketeer Oct 17 '21 at 12:44
  • 2
    `JavaSparkListener` was removed from Spark 2.0. The `spark-client-*.jar` in Hive 2.0 is compiled with Spark 1.5, so just replacing Spark jars it reverences breaks the dependencies. https://github.com/apache/spark/blob/branch-1.6/core/src/main/java/org/apache/spark/JavaSparkListener.java – mazaneicha Oct 17 '21 at 13:02
  • @mazaneicha So there's no way to have the above integration running? Hive version: 2.0.0 Spark 2.3.0 – Dhanush D Oct 18 '21 at 07:33
  • According to this [https://issues.apache.org/jira/browse/HIVE-14029], not without upgrading your Hive version. I personally think both Hive 2.0 and Spark 2.3 are quite old, its time to upgrade both! :) – mazaneicha Oct 18 '21 at 12:35
  • @maza Link doesn't work – OneCricketeer Oct 18 '21 at 14:11
  • Works in my browser, but could be a bracket... How about this -- https://issues.apache.org/jira/browse/HIVE-14029 – mazaneicha Oct 18 '21 at 14:15

2 Answers2

-1
2021-10-17T03:06:55,000 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(593)) - java.lang.NoClassDefFoundError: org/apache/spark/JavaSparkListener

This means Java Compatible error. Pls check Java version. I recommend use java 1.8 version only. Check dependencies as well. https://issues.apache.org/jira/browse/HIVE-14029

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
rajashree
  • 11
  • 3
  • 1
    java version is 1.8 itself. java -version `openjdk version "1.8.0_302". OpenJDK Runtime Environment (build 1.8.0_302-b08) OpenJDK 64-Bit Server VM (build 25.302-b08, mixed mode)` – Dhanush D Oct 17 '21 at 05:48
  • NoClassDefFoundError is not a java version issue. It's a classpath issue only – OneCricketeer Oct 17 '21 at 12:43
-1

I would recommend to use dependencies graph using gradle/maven. And exclude any ambiguity dependencies if you have. Because it looks like dependencies not properly added or the version of depended jar is overlapping while execution.

  • Spark and Hive already use Maven. There's nothing being compiled. Unclear what you're suggesting here – OneCricketeer Oct 17 '21 at 12:41
  • As this class is not found means the dependencies jars is not found properly. So suggested to check whether **JavaSparkListener** class is available in classpath. – Amiya Mishra Oct 18 '21 at 04:15
  • If you see the comments above, that class was explicitly removed from the Spark code base. There's nothing to add/exclude from the user because it's a pre-packaged Hive library that still requires an older version of Spark – OneCricketeer Oct 18 '21 at 14:10