0

I was trying out hbase-spark connector. To start with, I am trying out this code.

My pom dependencies are:

<dependencies>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.0.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.0.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hbase</groupId>
        <artifactId>hbase-spark</artifactId>
        <version>2.0.0-alpha4</version>
    </dependency>
</dependencies>

I am getting following exception while running the code:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/Logging at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:760) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.hbase.spark.JavaHBaseContext.(JavaHBaseContext.scala:46) at com.myproj.poc.sparkhbaseneo4j.App.main(App.java:71) Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 14 more

com.myproj.poc.sparkhbaseneo4j.App.main(App.java:71) is line 67 in github code.

I checked this thread. It says that I should include same versions of all libraries. Earlier, I had 2.3.0 versions of spark libraries in my pom. But I realized that hbase-spark has latest version 2.0.0. So I downgraded versions of of all spark libraries to 2.0.0. But I am still getting the same exception.

Or do I have to stick to 1.X.X versions only for using this, as this answer says its been removed after version 1.5.2?

Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97
Mahesha999
  • 22,693
  • 29
  • 116
  • 189

1 Answers1

1

Sean Owen suggested in http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Hit-quot-Exception-in-thread-main-java-lang-NoClassDefFoundError/td-p/44486 that "You shouldn't use org.apache.spark.Logging in your app at all. That's likely the problem and solution."

So you should be using the following or higher version

<!-- https://mvnrepository.com/artifact/commons-logging/commons-logging -->
<dependency>
    <groupId>commons-logging</groupId>
    <artifactId>commons-logging</artifactId>
    <version>1.1.1</version>
</dependency>

Updated

I specified that and now I am getting NoClassDefFoundError: org/apache/spark/streaming/dstream/DStream

for the above issue you need following dependency

<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming -->
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming_2.11</artifactId>
    <version>2.0.0</version>
    <scope>provided</scope>
</dependency>
Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97
  • **(Q1)** Did you mean to say that I should add the dependency you specified? I specified that and now I am getting `NoClassDefFoundError: org/apache/spark/streaming/dstream/DStream`. Do I miss something fundamental here? **(Q2)** Also I didnt get how adding that dependency resolved missing `Logging` class exception. I did not use `Logging` class anywhere in my code. It seems that the connector is using that class internally. But then how it was resolved by adding that dependency? It should be resolved only by adding dependency containing `org.apache.spark.Logging`. Isnt it? – Mahesha999 Apr 27 '18 at 13:17
  • I have updated my answer. Dependencies are resolved by looking at avaialable dependencies. You added apache commons logging which helped resolve you first dependency issue. Spark also uses external dependencies and package in its own. – Ramesh Maharjan Apr 27 '18 at 13:43
  • One small question. I want to get rowkeys for top 10 values of some column in my hbase table. For this, can I use hbase-spark connector to fetch all of values in spark RDDs and then perform sorting in memory through Spark sql? Will this be fast? Or should I follow some other approach? Or is it not doable in HBase using any framework. – Mahesha999 Apr 28 '18 at 05:13
  • One small request for you to upvote this answer if it helped you ;). now for your concern I guess you can write spark-sql queries to fetch only the top 10 records from Hbase. And yes it should be possible to do it in HBASE alone – Ramesh Maharjan Apr 28 '18 at 05:14
  • And if you cannot find a solution yourself or in stackoverflow you are always welcome for another question – Ramesh Maharjan Apr 28 '18 at 05:15
  • Sorry, I knew I can do it with spake SQL. The main question was a performance (will it be fast?) and how spark will do it, by fetching whole (possibly huge) hbase column in single node? – Mahesha999 Apr 28 '18 at 13:48
  • i am helpless on that but I guess it would be distributed rather than a single node. – Ramesh Maharjan Apr 28 '18 at 14:53