1

Iam trying to connect to hdfs locally via intelliJ installed on my laptop.The cluster I'am trying to connect to is Kerberized with an edge node. I generated a keytab for the edge node and configured that in the code below. Iam able to login to the edgenode now. But when I now try to access the hdfs data which is on the namenode it throws an error. Below is the Scala code that is trying to connect to hdfs:

import org.apache.spark.sql.SparkSession
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.hadoop.security.{Credentials, UserGroupInformation}
import org.apache.hadoop.security.token.{Token, TokenIdentifier}
import java.security.{AccessController, PrivilegedAction, PrivilegedExceptionAction}
import java.io.PrintWriter

object DataframeEx {
  def main(args: Array[String]) {
    // $example on:init_session$
    val spark = SparkSession
      .builder()
      .master(master="local")
      .appName("Spark SQL basic example")
      .config("spark.some.config.option", "some-value")
      .getOrCreate()

    runHdfsConnect(spark)

    spark.stop()
  }

   def runHdfsConnect(spark: SparkSession): Unit = {

    System.setProperty("HADOOP_USER_NAME", "m12345")
    val path = new Path("/data/interim/modeled/abcdef")
    val conf = new Configuration()
    conf.set("fs.defaultFS", "hdfs://namenodename.hugh.com:8020")
    conf.set("hadoop.security.authentication", "kerberos")
    conf.set("dfs.namenode.kerberos.principal.pattern","hdfs/_HOST@HUGH.COM")

    UserGroupInformation.setConfiguration(conf);
    val ugi=UserGroupInformation.loginUserFromKeytabAndReturnUGI("m12345@HUGH.COM","C:\\Users\\m12345\\Downloads\\m12345.keytab");

    println(UserGroupInformation.isSecurityEnabled())
     ugi.doAs(new PrivilegedExceptionAction[String] {
       override def run(): String = {
         val fs= FileSystem.get(conf)
         val output = fs.create(path)
         val writer = new PrintWriter(output)
         try {
           writer.write("this is a test")
           writer.write("\n")
         }
         finally {
           writer.close()
           println("Closed!")
         }
          "done"
       }
     })
  }
}

Iam able to log into the edgenode. But when Iam trying to write to hdfs (the doAs method) it throws the following error:

WARN Client: Exception encountered while connecting to the server : java.lang.IllegalArgumentException: Server has invalid Kerberos principal: hdfs/namenodename.hugh.com@HUGH.COM
18/06/11 12:12:01 ERROR UserGroupInformation: PriviledgedActionException m12345@HUGH.COM (auth:KERBEROS) cause:java.io.IOException: java.lang.IllegalArgumentException: Server has invalid Kerberos principal: hdfs/namenodename.hugh.com@HUGH.COM
18/06/11 12:12:01 ERROR UserGroupInformation: PriviledgedActionException as:m12345@HUGH.COM (auth:KERBEROS) cause:java.io.IOException: Failed on local exception: java.io.IOException: java.lang.IllegalArgumentException: Server has invalid Kerberos principal: hdfs/namenodename.hugh.com@HUGH.COM; Host Details : local host is: "INMBP-m12345/172.29.155.52"; destination host is: "namenodename.hugh.com":8020; 
Exception in thread "main" java.io.IOException: Failed on local exception: java.io.IOException: java.lang.IllegalArgumentException: Server has invalid Kerberos principal: hdfs/namenodename.hugh.com@HUGH.COM; Host Details : local host is: "INMBP-m12345/172.29.155.52"; destination host is: "namenodename.hugh.com":8020

If I log into the edgenode and do a kinit and then access the hdfs its fine. So why am I not able to access the hdfs namenode when Iam able to log into the edgenode?

Let me know if any more details are needed from my side.

Carol
  • 347
  • 5
  • 17
  • Do you use OpenJDK or Oracle JAVA with JCE (http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html ) on your laptop ? – Harold Jun 20 '18 at 15:31
  • About the _"why"_ in general: Kerberos is about strong authentication over a (possibly hostile) network. It is over 30 years old, requires fine-tuned configuration on client side; the Java implementation is, ahem, approximative; and Hadoop replaced some of the lame Java parts by its own oddities. Read https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/ to ponder that. – Samson Scharfrichter Jun 20 '18 at 21:12
  • About your code: it's absurdly complex, unless you are running a proxy server that runs commands on behalf of multiple clients, with each client's credentials. Just use the **static** `UserGroupInformation` that is sufficient in 99.9% of all use cases -- with a single call to `loginUserFromKeytab()` – Samson Scharfrichter Jun 20 '18 at 21:16
  • About Hadoop and Kerberos and Windows: you need a Windows version of the Hadoop "native libs", because some jerk packaged a server-side unit test in the client-side API. Cf. https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html (some parts are specific to Spark but the rest applies to the hadoop libs that Spark uses) -- I could do some nit-picking about using env var `%PATH%` instead of Java property `java.library.path`, and `%HADOOP_HOME%` instead of `hadoop.home.dir` but that's a good start – Samson Scharfrichter Jun 20 '18 at 21:28
  • 1
    To enable the (verbose) Kerberos trace in JAAS, set `-Dsun.security.krb5.debug=true -Djava.security.debug=gssloginconfig,configfile,configparser,logincontext` ; to enable the (verbose) Kerberos trace in the Hadoop extensions, set env variable `HADOOP_JAAS_DEBUG` to `true` >> it will sting your eyes, but comparing a working connection to a failed connection will help you to zoom on the root cause. With some trial-and-error. – Samson Scharfrichter Jun 20 '18 at 21:35
  • The error tells me that the server (assuming here that they are talking about the namenode server) has invalid Kerberos principal: hdfs/namenodename.hugh.com@HUGH.COM .. Any way I could resolve this? Is it an issue with the hdfs-site.xml config file? – Carol Jun 21 '18 at 04:39
  • @Harold Iam using the OpenJDK – Carol Jun 21 '18 at 04:44

1 Answers1

1

The Spark conf object was set incorrectly. Below is what worked for me:

val conf = new Configuration()
conf.set("fs.defaultFS", "hdfs://namenodename.hugh.com:8020")
conf.set("hadoop.security.authentication", "kerberos")
conf.set("hadoop.rpc.protection", "privacy")   ***---(was missing this parameter)***
conf.set("dfs.namenode.kerberos.principal","hdfs/_HOST@HUGH.COM") ***---(this was initially wrongly set as dfs.namenode.kerberos.principal.pattern)***
Carol
  • 347
  • 5
  • 17