0

I have an RDD that I wish to write to HDFS.

data.saveAsTextFile("hdfs://path/vertices")

This returns: WARN RetryInvocationHandler: Exception while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over null. Not retrying because try once and fail. org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]

I have checked KERBEROS and it is properly authenticated.

How do I solve this?

LearningSlowly
  • 8,641
  • 19
  • 55
  • 78
  • Is you kerberos using Keytab or Login and Password? – Thiago Baldim Nov 10 '16 at 19:36
  • It uses Keytab Thiago. – LearningSlowly Nov 10 '16 at 19:37
  • Did you check the Hadoop `core-site.xml` that Spark driver uses, and does it explicitly state that clients **must** use Kerberos? Because that error message is typical of lame Hadoop conf on client side. – Samson Scharfrichter Nov 13 '16 at 18:58
  • Hi Samson. in the `core-site.xml` I have ``` hadoop.security.authentication kerberos ``` and ``` hadoop.http.authentication.type kerberos ``` and ``` hadoop.http.authentication.simple.anonymous.allowed false ``` Does this satisfy? – LearningSlowly Nov 14 '16 at 06:36
  • I've also tried adding `--keytab` to `spark-submit` as per https://www.cloudera.com/documentation/enterprise/5-5-x/topics/sg_spark_auth.html – LearningSlowly Nov 14 '16 at 07:53

2 Answers2

0

Well,

You need to check your path /etc/security/keytabs and check if your spark keytab is there.

This path is the recommended to the Kerberos configuration. Maybe it can be in other path.

But the most important, this keytab should be in all workers machines in the same path.

Other thing that you can check is the configuration file of Spark that should be installed in:

SPARK_HOME/conf

This folder should have the spark conf file spark-defaults.conf this conf file need to have this stuffs:

spark.history.kerberos.enabled true
spark.history.kerberos.keytab /etc/security/keytabs/spark.keytab
spark.history.kerberos.principal user@DOMAIN.LOCAL
Thiago Baldim
  • 7,362
  • 3
  • 29
  • 51
  • Thanks Thiago. I can see it in master and all of the worker machines. Hmmmm – LearningSlowly Nov 10 '16 at 19:49
  • In your spark conf the variable spark.history.kerberos.keytab has the path right? – Thiago Baldim Nov 10 '16 at 19:56
  • Is this relevant? http://stackoverflow.com/questions/31707722/how-to-add-configuration-file-to-classpath-of-all-spark-executors-in-spark-1-2-0 – LearningSlowly Nov 10 '16 at 20:15
  • Sorry for the delay! So the path of Spark of default instalation should be here: `/etc/spark/conf` But to make it easy it will be in the `SPARK_HOME/conf` – Thiago Baldim Nov 11 '16 at 12:03
  • Thanks Thiago. Does the fact that it says "Available:[TOKEN, KERBEROS]" mean that it is in the `conf` file? I don't have access to the `conf` file unfortunately! – LearningSlowly Nov 11 '16 at 17:44
  • @LearningSlowly this probably is. The point is Kerberos ask for authentication when you access Hadoop, Kafka, Hive and etc stuffs in your cluster. You have Kerberos but your Spark doesnt have the option enabled. This is probably the issue. I will update the answer. – Thiago Baldim Nov 11 '16 at 20:39
0

The issue was actually related to how you reference a file in HDFS when using kerberos.

Rather than hdfs://<HOST>:<HTTP_PORT>

It is webhdfs://<HOST>:<HTTP_PORT>

LearningSlowly
  • 8,641
  • 19
  • 55
  • 78