0

Trying to connect to HBase database from Eclipse Scala program in Windows.

Cluster is secured using Kerberos Authentication, so its not connecting to the Hbase database.

Every time we are creating the jar file and running in the cluster. but this is not useful for development and debug.

How do i set the hbase-site.xml in classpath?

I downloaded *site.xml files tried adding the hbase-site.xml, core-site.xml and hdfs-site.xml as source folder and tried adding this files as external class folder from the project build path, but nothing is working. How do i make this working?

Is there anyway we can set the hbase-site.xml in sqlContext, since i am using sqlContext to read the Hbase tables using HortonWorks connector.

Error log is:

Exception in thread "main" java.io.IOException: java.lang.reflect.InvocationTargetException
       at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
       at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218)
       at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)
       at org.apache.spark.sql.execution.datasources.hbase.RegionResource.init(HBaseResources.scala:93)
       at org.apache.spark.sql.execution.datasources.hbase.ReferencedResource$class.liftedTree1$1(HBaseResources.scala:57)
       at org.apache.spark.sql.execution.datasources.hbase.ReferencedResource$class.acquire(HBaseResources.scala:54)
       at org.apache.spark.sql.execution.datasources.hbase.RegionResource.acquire(HBaseResources.scala:88)
       at org.apache.spark.sql.execution.datasources.hbase.ReferencedResource$class.releaseOnException(HBaseResources.scala:74)
       at org.apache.spark.sql.execution.datasources.hbase.RegionResource.releaseOnException(HBaseResources.scala:88)
       at org.apache.spark.sql.execution.datasources.hbase.RegionResource.<init>(HBaseResources.scala:108)
       at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD.getPartitions(HBaseTableScan.scala:60)
       at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
       at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
       at scala.Option.getOrElse(Option.scala:120)
       at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
       at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
       at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
       at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
       at scala.Option.getOrElse(Option.scala:120)
       at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
       at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
       at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
       at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
       at scala.Option.getOrElse(Option.scala:120)
       at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
       at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:190)
       at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:165)
       at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174)
       at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
       at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
       at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
       at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086)
       at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1498)
       at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1505)
       at org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1375)
       at org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1374)
       at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2099)
       at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1374)
       at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1456)
       at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:170)
       at org.apache.spark.sql.DataFrame.show(DataFrame.scala:350)
       at org.apache.spark.sql.DataFrame.show(DataFrame.scala:311)
       at org.apache.spark.sql.DataFrame.show(DataFrame.scala:319)
       at scb.HBaseBroadcast$.main(HBaseBroadcast.scala:106)
       at scb.HBaseBroadcast.main(HBaseBroadcast.scala)
Caused by: java.lang.reflect.InvocationTargetException
       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
       at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
       at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
       at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
       at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
       ... 44 more
Caused by: java.lang.AbstractMethodError: org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.getProxy()Lorg/apache/hadoop/io/retry/FailoverProxyProvider$ProxyInfo;
       at org.apache.hadoop.io.retry.RetryInvocationHandler.<init>(RetryInvocationHandler.java:73)
       at org.apache.hadoop.io.retry.RetryInvocationHandler.<init>(RetryInvocationHandler.java:64)
       at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58)
       at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:147)
       at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:510)
       at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:453)
       at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
       at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
       at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
       at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2625)
       at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2607)
       at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
       at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
       at org.apache.hadoop.hbase.util.DynamicClassLoader.<init>(DynamicClassLoader.java:104)
       at org.apache.hadoop.hbase.protobuf.ProtobufUtil.<clinit>(ProtobufUtil.java:241)
       at org.apache.hadoop.hbase.ClusterId.parseFrom(ClusterId.java:64)
       at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:75)
       at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:105)
       at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:879)
       at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:635)
       ... 49 more
Shankar
  • 8,529
  • 26
  • 90
  • 159

1 Answers1

1

You have a hadoop-dfs conflict. Please check the version on the server vs. the one on your development path.

John Leach
  • 518
  • 1
  • 3
  • 9
  • Thanks,let me check that, i haven't added hadoop as separate dependency, i just configured Spark 1.6.1 , which is getting all the dependencies related to the 1.6.1 version, in the Cluster we are using HDP 2.4.2, which is using Spark 1.6.1 version, i will verify the both Hadoop version.. thanks.. – Shankar Nov 19 '16 at 04:08
  • 1
    Shankar, we have this problem continuously at Splice Machine. We have just moved up to Spark 2.0 which is a little bit better. If you have more problems (which I suspect you will), respond to this thread. The usual problems for us are Guava, Jackson, and Servlet-api conflicts. Good Luck! – John Leach Nov 19 '16 at 15:24
  • Yours is not Kerberos authentication failed error. AFAIK Its not able to connect because of jar file mismatch. I faced same issue long ago type ` hbase classpath ` in cluster to see the jar files present at cluster side and same jar version of jar files needs to be present at windows eclipse .classpath. – Ram Ghadiyaram Nov 19 '16 at 17:31
  • I did below to configure correctly.. `public static void printClassPathResources() { final ClassLoader cl = ClassLoader.getSystemClassLoader(); final URL[] urls = ((URLClassLoader) cl).getURLs(); LOG.info("Print All Class path resources under currently running class"); for (final URL url : urls) { LOG.info(url.getFile()); } }` check this code from program from both sides. – Ram Ghadiyaram Nov 19 '16 at 17:44
  • `import java.net.URL import java.net.URLClassLoader import scala.collection.JavaConversions._ object App { def main(args: Array[String]) { val cl = ClassLoader.getSystemClassLoader val urls = cl.asInstanceOf[URLClassLoader].getURLs for (url <- urls) { println(url.getFile) } } }` scala version since you are using scala – Ram Ghadiyaram Nov 19 '16 at 17:55