Spark/Phoenix with Kerberos on YARN

Question

I have a Spark (1.4.1) application that runs on a non-kerberized cluster and I copied it to another instance that has Kerberos running. The application takes data from HDFS and puts it into Phoenix.

However, it does not work:

    ERROR ipc.AbstractRpcClient: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
    javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
            at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
            at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
            at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:611)
            at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:156)
            at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:737)
            at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:734)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:422)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
            at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:734)
            at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:887)
            at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:856)
            at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1200)
            at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
            at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
            at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:50918)
            at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(ConnectionManager.java:1564)
            at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1502)
            at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1524)
            at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1553)
            at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1704)
            at org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:38)
            at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124)
            at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3917)
            at org.apache.hadoop.hbase.client.HBaseAdmin.getTableDescriptor(HBaseAdmin.java:441)
            at org.apache.hadoop.hbase.client.HBaseAdmin.getTableDescriptor(HBaseAdmin.java:463)
            at org.apache.phoenix.query.ConnectionQueryServicesImpl.ensureTableCreated(ConnectionQueryServicesImpl.java:815)
            at org.apache.phoenix.query.ConnectionQueryServicesImpl.createTable(ConnectionQueryServicesImpl.java:1215)
            at org.apache.phoenix.query.DelegateConnectionQueryServices.createTable(DelegateConnectionQueryServices.java:112)
            at org.apache.phoenix.schema.MetaDataClient.createTableInternal(MetaDataClient.java:1902)
            at org.apache.phoenix.schema.MetaDataClient.createTable(MetaDataClient.java:744)
            at org.apache.phoenix.compile.CreateTableCompiler$2.execute(CreateTableCompiler.java:186)
            at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:304)
            at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:296)
            at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
            at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:294)
            at org.apache.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:1243)
            at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1893)
            at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1862)
            at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77)
            at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1862)
            at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:180)
            at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:132)
            at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:151)
            at java.sql.DriverManager.getConnection(DriverManager.java:664)
            at java.sql.DriverManager.getConnection(DriverManager.java:208)
            at org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:99)
            at org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:57)
            at org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:45)
            at org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil.getSelectColumnMetadataList(PhoenixConfigurationUtil.java:263)
            at org.apache.phoenix.spark.PhoenixRDD.toDataFrame(PhoenixRDD.scala:109)
            at org.apache.phoenix.spark.SparkSqlContextFunctions.phoenixTableAsDataFrame(SparkSqlContextFunctions.scala:37)
            at com.bosch.asc.utils.HBaseUtils$.scanPhoenix(HBaseUtils.scala:123)
            at com.bosch.asc.SMTProcess.addLookup(SMTProcess.scala:1125)
            at com.bosch.asc.SMTProcess.saveMountTraceLogToPhoenix(SMTProcess.scala:1039)
            at com.bosch.asc.SMTProcess.runETL(SMTProcess.scala:87)
            at com.bosch.asc.SMTProcessMonitor$delayedInit$body.apply(SMTProcessMonitor.scala:20)
            at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
            at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
            at scala.App$$anonfun$main$1.apply(App.scala:71)
            at scala.App$$anonfun$main$1.apply(App.scala:71)
            at scala.collection.immutable.List.foreach(List.scala:318)
            at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
            at scala.App$class.main(App.scala:71)
            at com.bosch.asc.SMTProcessMonitor$.main(SMTProcessMonitor.scala:5)
            at com.bosch.asc.SMTProcessMonitor.main(SMTProcessMonitor.scala)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:498)
            at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:486)
    Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
            at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
            at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
            at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
            at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
            at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
            at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
            at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
            ... 70 more

I have added

export _JAVA_OPTIONS="-Djava.security.krb5.conf=/etc/hadoop/krb5.conf"

in my Spark submission script, but to no avail. Do I have to change the code itself to allow for authentication? I had previously assumed that the ticket is just shared between applications, and the code itself does not change.

In case it helps: in the shell I do not see a spark.authenticate option set when I execute:

sc.getConf.getAll.foreach(println)

See: http://spark.apache.org/docs/latest/security.html

I have very little experience with Kerberos, so any help is greatly appreciated.

To enable the Kerberos debug info: `export HADOOP_JAAS_DEBUG=true` plus `-Dsun.security.krb5.debug=true ` — Samson Scharfrichter, Jul 07 '16 at 19:28
https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/secrets.html — Samson Scharfrichter, Jul 07 '16 at 19:29
Do you run Spark in "local" mode? Otherwise the executors may not have a valid Kerberos ticket on the host they are running on, and you must manage the Hadoop auth yourself, cf. http://stackoverflow.com/questions/35332026/issue-scala-code-in-spark-shell-to-retrieve-data-from-hbase/35473941 — Samson Scharfrichter, Jul 07 '16 at 19:35
With debugging enabled I see a bunch of Kerberos tickets being generated but I still get the same error as well as `yarn.ApplicationMaster: User class threw exception: org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=35` caused by `org.apache.hadoop.hbase.MasterNotRunningException: com.google.protobuf.ServiceException: java.io.IOException: Could not set up IO Streams to ...`. I also found [this link](http://bigdatanoob.blogspot.de/2013/09/connect-phoenix-to-secure-hbase-cluster.html) where it seems that Kerberos requires the connection string to be modified. So no kinit? — Ian, Jul 08 '16 at 06:08

score 0 · Answer 1 · answered Jul 07 '16 at 06:38

0

Assuming that your cluster was properly kerberized, initialize your credentials with:

kinit -kt /path/to/keytab/file user/domain@realm

answered Jul 07 '16 at 06:38

kliew

3,073
1
14
25

I did that and I still get the same message. It seems that Phoenix (using the Phoenix/Spark library) does not accept the ticket. I even added the `keytab` and `principal` parameters to Spark. – Ian Jul 07 '16 at 07:14

score 0 · Answer 2 · answered Jul 08 '16 at 08:21

0

I think the reason is that on 4.4 the Phoenix/Spark library does not handle Kerberos principals and keytabs: https://issues.apache.org/jira/browse/PHOENIX-2817.

I tried to read data from an existing Phoenix table and I got that there was no suitable driver found and the jdbc connection string did not contain the keytab and principal (even though hbase-site.xml was correctly added and the HBase configuration I passed to Phoenix had these values) as shown here: https://phoenix.apache.org/index.html#Connection.

answered Jul 08 '16 at 08:21

Ian

1,294
3
17
39

When reading the discussion thread for that JIRA, it becomes clear that Kerberos is a *symptom*, the real issue was how Phoenix handles ZooKeeper *(which has its own way of handling Kerberos)*. – Samson Scharfrichter Jul 08 '16 at 09:43
So... did you try the workaround suggested at the bottom of the JIRA thread? Or do you consider upgrading to a more recent version of Phoenix? – Samson Scharfrichter Jul 08 '16 at 09:46
Which one did you try - the workaround, or the upgrade? – Samson Scharfrichter Jul 08 '16 at 09:58
I can't upgrade because that's not in my hand. The workaround does not work for me. – Ian Jul 11 '16 at 05:34
Well, looks like you hit a wall **:-/** – Samson Scharfrichter Jul 11 '16 at 12:23

score 0 · Answer 3 · edited May 23 '17 at 12:03

0

I was facing the same issue after lot of trail n error , I was able to fix this issue, please follow the below link for answer+explanation Spark Streaming and Phoenix Kerberos issue

edited May 23 '17 at 12:03

Community

1
1

answered May 16 '17 at 09:43

nilesh1212

1,561
2
26
60

Spark/Phoenix with Kerberos on YARN

3 Answers3