0

i rewrite this code:

import org.apache.spark.sql.SparkSession

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "file:///root/spark/README.md"
    val spark = SparkSession.builder.appName("Simple Application").getOrCreate()
    val logData = spark.read.textFile(logFile).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println(s"Lines with a: $numAs, Lines with b: $numBs")
    spark.stop()
  }
}

to this:

import org.apache.livy._
import org.apache.spark.sql.SparkSession

class Test extends Job[Int]{

  override def call(jc: JobContext): Int = {
  
    val spark = jc.sparkSession()

    val logFile = "file:///root/spark/README.md"
    val logData = spark.read.textFile(logFile).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println(s"Lines with a: $numAs, Lines with b: $numBs")
    
    1 //Return value
  }
}

but when compile it with sbt val spark not recognized correctly and i received error "value read is not a member of Nothing"

also after comment spark related code when i try to run resulted jar file using /batches i received error "java.lang.NoSuchMethodException: Test.main([Ljava.lang.String;)"

please any body can show correct spark scala code rewriting way?

  • 1
    Why do you have to rewrite it? And I don't really understand your question to be honest. But you need a main function or something that extends App to be able to run it? I am very confused about your question in general – GamingFelix Aug 25 '20 at 13:13
  • i need to rewrite it because my application are more complicated than this example and i think pure scala code is not efficient to run directly in livy because i should use livy created spark session not my own – seyyed heydar javadi Aug 26 '20 at 06:32

1 Answers1

0

There's no need to rewrite your Spark application in order to use Livy. Instead, you can use its REST interface to submit jobs on a cluster that has a running livy server, retrieve logs, get job state, etc.

As a practical example, here are instructions to run your application on AWS.

Setup:

  1. Use AWS EMR to create a Spark cluster that has Spark, Livy and any other preinstalled applications you need for your application.
  2. Upload your JAR to AWS S3.
  3. Make sure that the security group attached to your cluster has an inbound rule that whitelists your IP on port 8998 (Livy's port).
  4. Make sure that your cluster has access to S3 in order to fetch the JAR.

Now you'll be able to issue a POST request using cURL (or any equivalent) to submit your application:

curl -H "Content-Type: application/json" -X POST --data '{"className":"<your-package-name>.SimpleApp","file":"s3://<path-to-your-jar>"}' http://<cluster-domain-name>:8998/batches
Hedi Bejaoui
  • 384
  • 2
  • 16
  • i can access locally to my jar file (first not rewrited code) and i test it and it work! are you say that is enough? and is not required to make any change. i previously run my jar files using spark-submit – seyyed heydar javadi Aug 26 '20 at 06:40
  • another question @HediBejaoui if i submit my application like you said, still i can benefit from livy frequent data caching mechanism? – seyyed heydar javadi Aug 26 '20 at 06:47
  • @seyyedheydarjavadi Under the hood, Livy will be performing a spark-submit command, so you don't need to make any changes from your side. – Hedi Bejaoui Aug 26 '20 at 12:21
  • @seyyedheydarjavadi Livy is a REST interface to manage Spark sessions and submit snippets of code or a JAR on a cluster. Caching is performed by Spark. – Hedi Bejaoui Aug 26 '20 at 12:22