How to submit Spark jobs to Apache Livy?

Question

I am trying to understand how to submit Spark job to Apache Livy.

I added the following API to my POM.xml:

 <dependency>
     <groupId>com.cloudera.livy</groupId>
     <artifactId>livy-api</artifactId>
     <version>0.3.0</version>
 </dependency>

 <dependency>
     <groupId>com.cloudera.livy</groupId>
     <artifactId>livy-scala-api_2.11</artifactId>
     <version>0.3.0</version>
 </dependency>

Then I have the following code in Spark that I want to submit to Livy on request.

import org.apache.spark.sql.{DataFrame, SparkSession}
import org.apache.spark.sql.functions._

object Test {

  def main(args: Array[String]) {

    val spark = SparkSession.builder()
                            .appName("Test")
                            .master("local[*]")
                            .getOrCreate()


    import spark.sqlContext.implicits._

    implicit val sparkContext = spark.sparkContext

    // ...
  }
}

To have the following code that creates a LivyClient instance and uploads the application code to the Spark context:

val client = new LivyClientBuilder()
  .setURI(new URI(livyUrl))
  .build()

try {
  client.uploadJar(new File(testJarPath)).get()

  client.submit(new Test())

} finally {
  client.stop(true)
}

However, the problem is that the code of Test is not adapted to be used with Apache Livy.

How can I adjust the code of Test object in order to be able to run client.submit(new Test())?

score 3 · Accepted Answer · answered Mar 11 '18 at 14:06

Your Test class needs to implement Livy's Job interface and you need to implement its call method in your Test class, from where you will get access to jobContext/SparkContext. You can then pass the instance of Test in the submit method

You don't have to create SparkSession by yourself, Livy will create it on the cluster and you can access that context in your call method.

You can find more detailed information on Livy's programmatic API here: https://livy.incubator.apache.org/docs/latest/programmatic-api.html

Here's a sample implementation of Test Class:

import com.cloudera.livy.{Job, JobContext}

class Test extends Job[Int]{

  override def call(jc: JobContext): Int = {

    val spark = jc.sparkSession()

    // Do anything with SparkSession

    1 //Return value
  }
}

I use `import org.apache.livy._` to make it working, instead of `com.cloudera....` — Markus, Mar 11 '18 at 14:44
I opened a new thread related to this question. You may want to take a look: https://stackoverflow.com/questions/49224564/how-to-pass-parameters-to-sparksession-using-apache-livy — Markus, Mar 11 '18 at 20:43

How to submit Spark jobs to Apache Livy?

1 Answers1

Linked