2

I try to deploy spark application to kerberized hadoop cluster which is controlled by YARN. Version of Spark is 1.5.0-cdh5.5.2.

I'm facing strange exception, when stopping SparkContext after more than 10 seconds idle and initializating a new one.

I've tried to do something similar to what did this developer, and explicitly specified hdfs namenode address, but it didn't help.

What is more confusing that everything works fine if I don't reset SparkContext at all or reset it in less than ~10 seconds after last command was executed in this spark context.

How could I fix it?

Here is minimized case where problem is met:

package demo;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;


public class App
{
    public static void main( String[] args ) throws Exception {

        SparkConf sparkConf = new SparkConf();
        sparkConf.setAppName("demo");
        sparkConf.set("spark.yarn.access.namenodes", "hdfs://hdp:8020");

        JavaSparkContext jsc = new JavaSparkContext(sparkConf);

        int waiting_time = 10;
        System.out.println("Waiting time: " + Integer.toString(waiting_time));
        Thread.sleep(waiting_time * 1000);

        jsc.stop();

        jsc = new JavaSparkContext(sparkConf); // "Delegation token ..." exception here
    }

}

Stack trace when exception is raised: https://gist.github.com/anonymous/18e15010010069b119aa0934d6f42726

spark-submit command:

spark-submit --principal mp@LAGOON --keytab mp.keytab --master yarn-client --class demo.App demo.jar
T-Heron
  • 5,385
  • 7
  • 26
  • 52
Alexey Klimov
  • 69
  • 1
  • 7
  • it looks like this exception is not the root cause. try to check NameNode logs to see what is the real issue – AdamSkywalker Nov 27 '16 at 22:12
  • I've done some investigation and found out that nameNode.getDelegationToken is called during constructing SparkContext even if delegation token is already presented in token list of current logged user in object of UserGroupInforation class. The problem doesn't occur when waiting time before constructing a new context is less than 10 seconds, because rpc connection to namenode just isn't resetting because of ipc.client.connection.maxidletime property. – Alexey Klimov Dec 15 '16 at 22:01
  • Is this a bug or I can't use SparkContext in this way? – Alexey Klimov Dec 15 '16 at 22:02

2 Answers2

1

Problem was caused by this problem: https://issues.apache.org/jira/browse/SPARK-15754

In Spark 1.6.2 it was fixed.

Alexey Klimov
  • 69
  • 1
  • 7
0

for me, relogin every time solves the problem

  def main(args: Array[String]): Unit = {

    val timer = new Timer()
    timer.schedule(new TimerTask {
      override def run(): Unit = {
        UserGroupInformation.reset()
        UserGroupInformation.loginUserFromKeytab("xxx", "/path/to/keytab")
        val spark = SparkSession.builder()
          .appName("TokenRenew")
          .getOrCreate()
        spark.read.csv("/tmp/test.txt").show
        spark.stop()

      }
    }, 0, 1000 * 60)

  }
JasonWayne
  • 1,724
  • 1
  • 19
  • 16