0

Sorry for this question as this has been raised numerous time on SO, But I am still not able to find a solution for my issue after going through each relevant post. I am using Spark Streaming (1.6.1) with Phoenix (4.4) on Kerberos env on HDP 2.4.2 getting below exception when try to read or write from HBase. I get the same issue even after skipping key-ph.conf file from spark-submit.

I had a look on below post which has identical problem as mine but still I am not able to find solution for my issue:

https://community.hortonworks.com/questions/56848/spark-cant-connect-to-secure-phoenix.html

Spark can't connect to secure phoenix

Below is my Spark-submit command.

spark-submit \
--verbose \
--master yarn-cluster \
--num-executors 2  \
--executor-memory 8g \
--executor-cores 4 \
--conf spark.driver.memory=1024m  \
--files key-ph.conf#key-ph.conf,user.headless.keytab#user.headless.keytab,/etc/hbase/2.4.2.0-258/0/hbase-site.xml \
--jars /usr/hdp/2.4.2.0-258/hbase/lib/hbase-common-1.1.2.2.4.2.0-258.jar,/usr/hdp/2.4.2.0-258/hbase/lib/hbase-client-1.1.2.2.4.2.0-258.jar,/usr/hdp/2.4.2.0-258/hbase/lib/hbase-server-1.1.2.2.4.2.0-258.jar,/usr/hdp/2.4.2.0-258/hbase/lib/hbase-protocol-1.1.2.2.4.2.0-258.jar,/usr/hdp/2.4.2.0-258/hbase/lib/htrace-core-3.1.0-incubating.jar,/usr/hdp/2.4.2.0-258/hbase/lib/guava-12.0.1.jar,/usr/hdp/2.4.2.0-258/phoenix/lib/phoenix-core-4.4.0.2.4.2.0-258.jar,/usr/hdp/2.4.2.0-258/phoenix/phoenix-4.4.0.2.4.2.0-258-client-spark.jar \
--driver-java-options "-Djava.security.auth.login.config=./key-ph.conf -Dhttp.proxyHost=proxy-host -Dhttp.proxyPort=8080 -Dhttps.proxyHost=proxy-host -Dhttps.proxyPort=8080 -Dlog4j.configuration=file:/home/user/spark-log4j/log4j-phoenix-driver.properties" \
--conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./key-ph.conf -Dlog4j.configuration=file:/home/user/spark-log4j/log4j-phoenix-executor.properties" \
--class com.spark.demo.SampleInsert /home/user/test-ph.jar tableName ZK_IP:2181:/hbase-secure:user@CLIENT.LAN:/home/user/user.headless.keytab

Spark Code:

 demoArrDataFrame.write
        .format("org.apache.phoenix.spark")
        .options(Map("table" -> tableName.toUpperCase,
          "zkUrl" -> "ZK_IP:2181:/hbase-secure:user@FORSYS.LAN:/home/user/user.headless.keytab"))
        .mode(SaveMode.Overwrite)
        .save

16/12/05 16:11:36 WARN AbstractRpcClient: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
16/12/05 16:11:36 ERROR AbstractRpcClient: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
    at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
    at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
    at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:611)
    at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:156)
    at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:737)
    at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:734)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
    at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:734)
    at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:887)
    at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:856)
    at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1200)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
    at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:58152)
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(ConnectionManager.java:1571)
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1509)
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1531)
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1560)
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1711)
    at org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:38)
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124)
    at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4083)
    at org.apache.hadoop.hbase.client.HBaseAdmin.getTableDescriptor(HBaseAdmin.java:528)
    at org.apache.hadoop.hbase.client.HBaseAdmin.getTableDescriptor(HBaseAdmin.java:550)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl.ensureTableCreated(ConnectionQueryServicesImpl.java:810)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl.createTable(ConnectionQueryServicesImpl.java:1174)
    at org.apache.phoenix.query.DelegateConnectionQueryServices.createTable(DelegateConnectionQueryServices.java:112)
    at org.apache.phoenix.schema.MetaDataClient.createTableInternal(MetaDataClient.java:1974)
    at org.apache.phoenix.schema.MetaDataClient.createTable(MetaDataClient.java:770)
    at org.apache.phoenix.compile.CreateTableCompiler$2.execute(CreateTableCompiler.java:186)
    at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:305)
    at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:297)
    at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
    at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:295)
    at org.apache.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:1244)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1850)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1819)
    at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1819)
    at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:180)
    at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:132)
    at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:151)
    at java.sql.DriverManager.getConnection(DriverManager.java:664)
    at java.sql.DriverManager.getConnection(DriverManager.java:208)
    at org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:99)
    at org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:82)
    at org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:70)
    at org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil.getUpsertColumnMetadataList(PhoenixConfigurationUtil.java:232)
    at org.apache.phoenix.spark.DataFrameFunctions$$anonfun$2.apply(DataFrameFunctions.scala:45)
    at org.apache.phoenix.spark.DataFrameFunctions$$anonfun$2.apply(DataFrameFunctions.scala:41)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$22.apply(RDD.scala:717)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$22.apply(RDD.scala:717)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
    at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
    at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
    at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
    at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
    at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
    at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
    at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
    ... 62 more
Community
  • 1
  • 1
nilesh1212
  • 1,561
  • 2
  • 26
  • 60

2 Answers2

0

I am able to fix this issue by performing below steps as follows:

1) Pass required hbase,phoenix jars into spark extra classpath option:

--conf "spark.executor.extraClassPath=/usr/hdp/2.4.2.0-258/hbase/lib/hbase-common-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/hbase-client-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/hbase-server-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/hbase-protocol-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/2.4.2.0-258/hbase/lib/guava-12.0.1.jar:/usr/hdp/current/spark-client/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar" \

--conf "spark.driver.extraClassPath=/usr/hdp/2.4.2.0-258/hbase/lib/hbase-common-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/hbase-client-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/hbase-server-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/hbase-protocol-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/2.4.2.0-258/hbase/lib/guava-12.0.1.jar:/usr/hdp/current/spark-client/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar" \

2) Valid keytab and jaas conf in spark extra java options:

--conf "spark.driver.extraJavaOptions=-XX:+UseG1GC -Djava.security.auth.login.config=./kafka_jaas.conf -Dhttp.proxyHost=PROXY.IP -Dhttp.proxyPort=8080 -Dhttps.proxyHost=PROXY.IP2 -Dhttps.proxyPort=8080" \

--conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -Djava.security.auth.login.config=./kafka_jaas.conf" \

kafka-jaas.conf file as an example:

KafkaServer {
   com.sun.security.auth.module.Krb5LoginModule required
   useKeyTab=true
   keyTab="/etc/security/keytabs/kafka.service.keytab"
   storeKey=true
   useTicketCache=false
   serviceName="kafka"
   principal="kafka/IP@REALM.LAN";
};
KafkaClient {
   com.sun.security.auth.module.Krb5LoginModule required
   useTicketCache=true
   renewTicket=true
   serviceName="kafka";
};
Client {
   com.sun.security.auth.module.Krb5LoginModule required
   useKeyTab=true
   keyTab="/etc/security/keytabs/kafka.service.keytab"
   storeKey=true
   useTicketCache=false
   serviceName="zookeeper"
   principal="kafka/IP@REALM.LAN";
};

3) Configuring --files spark options

--files kafka_jaas.conf#kafka_jaas.conf,user.headless.keytab#user.headless.keytab,/etc/hbase/conf/hbase-site.xml#hbase-site.xml \

After following the steps if you are still not able to connect to secured phoenix. Please set below SPARK_CLASSPATH and then run/execute spark-submit command.

export SPARK_CLASSPATH=/usr/hdp/2.4.2.0-258/hbase/lib/hbase-common-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/hbase-client-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/hbase-server-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/hbase-protocol-1.1.2.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/hbase/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/2.4.2.0-258/hbase/lib/guava-12.0.1.jar:/usr/hdp/current/spark-client/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar
nilesh1212
  • 1,561
  • 2
  • 26
  • 60
0

I'm pretty sure your issue has been solved, but I believe it can help who is still facing the same problem.

Spark documentation says:

Long-Running Applications Long-running applications may run into issues if their run time exceeds the maximum delegation token lifetime configured in services it needs to access.

Spark supports automatically creating new tokens for these applications when running in YARN mode. Kerberos credentials need to be provided to the Spark application via the spark-submit command, using the --principal and --keytab parameters.

The provided keytab will be copied over to the machine running the Application Master via the Hadoop Distributed Cache. For this reason, it’s strongly recommended that both YARN and HDFS be secured with encryption, at least.

The Kerberos login will be periodically renewed using the provided credentials, and new delegation tokens for supported will be created.

Follow bellow the example of a working spark-submit:

    spark-submit -v \
    --principal ${keytabPrincipal} \
    --keytab ${keytabPath} \
    --files ${configDir}/log4j-bdfsap.properties#log4j-bdfsap.properties,${configDir}/jaas.conf#jaas.conf,hdfs:///apps/spark/lib/spark-streaming-kafka-assembly_2.10-1.5.2.2.3.4.7-4.jar#spark-streaming-kafka-assembly_2.10-1.5.2.2.3.4.7-4.jar,hdfs:///apps/hive/conf/hive-site.xml#hive-site.xml,hdfs:///apps/spark/lib/elasticsearch-spark_2.10-2.4.0.jar#elasticsearch-spark_2.10-2.4.0.jar  \
    --driver-java-options "-Djava.security.auth.login.config=jaas.conf -Dlog4j.configuration=log4j-bdfsap.properties" \
 --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf -Dlog4j.configuration=log4j.properties" \
    --conf "spark.yarn.maxAppAttempts=4" \
    --conf "spark.yarn.am.attemptFailuresValidityInterval=1h" \
    --conf "spark.yarn.max.executor.failures=8" \
    --conf "spark.yarn.executor.failuresValidityInterval=1h" \
    --conf "spark.task.maxFailures=8" \
    --conf "spark.hadoop.fs.hdfs.impl.disable.cache=true" \
    --jars hdfs:///apps/spark/lib/spark-streaming-kafka-assembly_2.10-1.5.2.2.3.4.7-4.jar,hdfs:///apps/spark/lib/elasticsearch-spark_2.10-2.4.0.jar \
    --name ${APP_NAME} \
    --class ${APP_CLASS} \
    --master yarn-cluster \
    --driver-cores ${DRIVER_CORES} \
    --driver-memory ${DRIVER_MEMORY} \
    --num-executors ${NUM_EXECUTORS} \
    --executor-memory ${EXECUTOR_MEMORY} \
    --executor-cores ${EXECUTOR_CORES} \
    --queue ${queueNameSparkStreaming} \
    ${APP_LIB_HDFS_PATH} ${APP_CONF_HDFS_PATH}/${LOAD_CONF_FILE}

jaas.conf:

KafkaClient {
  com.sun.security.auth.module.Krb5LoginModule required
  useKeyTab=true
  storeKey=true
  keyTab="./user.keytab"
  useTicketCache=false
  serviceName="kafka"
  principal="user@BRUBLES";
};
Gregory Iyama
  • 139
  • 1
  • 3