0

I'm trying to connect to Hbase using Hbase client API in a kerborized Cloudera cluster.

Sample code:

Configuration hbaseConf = HBaseConfiguration.create();
        /*hbaseConf.set("hbase.master", "somenode.net:2181");
        hbaseConf.set("hbase.client.scanner.timeout.period", "1200000");
        hbaseConf.set("hbase.zookeeper.quorum",
                        "somenode.net,somenode2.net");
        hbaseConf.set("zookeeper.znode.parent", "/hbase");*/
        hbaseConf.setInt("timeout", 120000);
        hbaseConf.set(TableInputFormat.INPUT_TABLE, tableName);
        //hbaseConf.addResource("src/main/resources/hbase-site.xml");
        UserGroupInformation.setConfiguration(hbaseConf);
        UserGroupInformation.loginUserFromKeytab("principal", "keytab");
        JavaPairRDD<ImmutableBytesWritable, Result> javaPairRdd = ctx
                .newAPIHadoopRDD(hbaseConf, TableInputFormat.class,
                        ImmutableBytesWritable.class, Result.class);

I tried to set the hbase-site.xml in the maven project resources, also passed as jar file in spark-submit command using --jars, but nothing works.

Error log:

Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=68545: row '¨namespace:test,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=hostname.net,60020,1511970022474, seqNum=0
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)
        at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
        at sun.nio.ch.IOUtil.write(IOUtil.java:65)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
        at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:278)
        at org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:266)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:920)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:873)
        at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1242)
        at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
        at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:34094)
        at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:394)
        at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:203)
        at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:64)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:360)
        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:334)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
        ... 4 more
18/02/26 16:25:42 INFO spark.SparkContext: Invoking stop() from shutdown hook
Shankar
  • 8,529
  • 26
  • 90
  • 159

1 Answers1

0

The problem you are facing is because your environment is not properly set up.

I have answer it to my own question here

Screenshot

Manas
  • 519
  • 4
  • 14
  • when you say change Hbase service as Hbase instead of None, where exactly i need to modify? is it in Cloudera Manager? – Shankar Feb 26 '18 at 18:33
  • I have updated my answer with a screen shot that shows exactly what configuration needs changed. Please let me know if it works and if it does please accept my answer. – Manas Feb 26 '18 at 18:36
  • Thanks, i will check it tomorrow and update you, since i dont have access right now.. – Shankar Feb 26 '18 at 18:40
  • I dont see the Hbase Service option under Spark Configuration.. is it something i can set in the code? – Shankar Feb 27 '18 at 09:05
  • Are you providing cloudera principal and keytab with your spark job? – Manas Feb 27 '18 at 09:10
  • I see that your principal and keytab are just string named principal and keytab. You don't need that just passing principal and keytab to your spark app should suffice. – Manas Feb 27 '18 at 09:12
  • Which cloudera distribution are you using? I am assuming you have both spark and hbase installed through cloudera. Post the spark config and may be that will lead me somewhere. Particularly after searching hbase in spark config. – Manas Feb 27 '18 at 12:42