2

I don't know how to pass SparkSession parameters programmatically when submitting Spark job to Apache Livy:

This is the Test Spark job:

class Test extends Job[Int]{

  override def call(jc: JobContext): Int = {

    val spark = jc.sparkSession()

    // ...

  }
}

This is how this Spark job is submitted to Livy:

val client = new LivyClientBuilder()
  .setURI(new URI(livyUrl))
  .build()

try {
  client.uploadJar(new File(testJarPath)).get()

  client.submit(new Test())

} finally {
  client.stop(true)
}

How can I pass the following configuration parameters to SparkSession?

  .config("es.nodes","1localhost")
  .config("es.port",9200)
  .config("es.nodes.wan.only","true")
  .config("es.index.auto.create","true")
Markus
  • 3,562
  • 12
  • 48
  • 85

3 Answers3

2

You can do that easily through the LivyClientBuilder like this:

val client = new LivyClientBuilder()
  .setURI(new URI(livyUrl))
  .setConf("es.nodes","1localhost")
  .setConf("key", "value")
  .build()
Lokesh Yadav
  • 958
  • 2
  • 9
  • 20
2

Configuration parameters can be set to LivyClientBuilder using

public LivyClientBuilder setConf(String key, String value)

so that your code starts with:

val client = new LivyClientBuilder()
  .setURI(new URI(livyUrl))
  .setConf("es.nodes","1localhost")
  .setConf("es.port",9200)
  .setConf("es.nodes.wan.only","true")
  .setConf("es.index.auto.create","true")
  .build()
pcejrowski
  • 603
  • 5
  • 15
  • 1
    I am trying to use the conf field in the [livy post call](https://livy.incubator.apache.org/docs/latest/rest-api.html) to set `spark.network.timeout` to `600s`. Is there a way to verify the same whether the value got correctly set or not? I have a [question as well on this](https://stackoverflow.com/questions/55690915/how-to-check-spark-config-for-an-application-in-ambari-ui-posted-with-livy) – Sayantan Ghosh Apr 16 '19 at 14:49
  • 1
    I am in the same boat, posting the conf via the REST API, and it appears Livy is not passing it to the spark context – willredington315 May 22 '20 at 16:40
0

LivyClientBuilder.setConf will not work, I think. Because Livy will modify all configs not starting with spark.. And Spark cannot read the modified config. See here

private static File writeConfToFile(RSCConf conf) throws IOException {
    Properties confView = new Properties();
    for (Map.Entry<String, String> e : conf) {
      String key = e.getKey();
      if (!key.startsWith(RSCConf.SPARK_CONF_PREFIX)) {
        key = RSCConf.LIVY_SPARK_PREFIX + key;
      }
      confView.setProperty(key, e.getValue());
    }
 ...
}

So the answer is quite simple: add spark. to all es configs, like this,

  .config("spark.es.nodes","1localhost")
  .config("spark.es.port",9200)
  .config("spark.es.nodes.wan.only","true")
  .config("spark.es.index.auto.create","true")

Don't know it is elastic-spark does the compatibility job, or spark. It just works.

Spark UI shows the configs

PS: I've tried with the REST API, and it works. But not with the Programmatic API.

debugging
  • 53
  • 8