0

My Requirement : From web application trigger a Spark job in Yarn and display the result back in web page. The spark job accepts few arguments and computes a DataSet with values that needs to be returned to web application.

After doing some browsing in the web , I figured Livy can be used for this.

Livy was already installed with HDP 2.5. So I created new Livy session using POST/Sessions and including my jar file.

{"kind":"spark","name":"livy","jars":["/xyz.jar"],"proxyUser":"livy"}

(I had to include header 'x-requested-by' as csrfPrevention was enabled.) Note:- the jar had to be placed in HDFS for this to work

As per Livy Examples :- https://livy.apache.org/examples/ I can pass code snippets as "data = {'code': '1 + 1'}" I don't understand how I can invoke the method in my class.I do not have 'className' option as per Livy Rest API Documentation - https://livy.apache.org/docs/latest/rest-api.html

If I use POST/Batch to create the session , I can specify a jar and my main class.But doing it this way I will not get my result back in my web application.

Java Code in my jar file :

public class LivySample {


    public String executeSampleLivy(SparkContext sc,String input){
        JavaSparkContext jsc = new JavaSparkContext(sc);
        List<String> listNames = Arrays.asList("abc","def","ghi");
        JavaRDD<String> rdd =  jsc.parallelize(listNames);
        return rdd.filter(l->l.contains(input)).collect().get(0);
    }

}

I tried to run the below code as POST on Livy url - sessions/20/statements '''

{
  "code": "import LivySample;LivySample lv = new LivySample();lv.executeSampleLivy(sc, \"abc\")"
}

Error I got while invoking GET sessions/21/statements/0:

  {
"id": 2,
"state": "available",
"output": {
"status": "error",
"execution_count": 2,
"ename": "Error",
"evalue": "<console>:1: error: '.' expected but ';' found. import LivySample;LivySample lv = new LivySample();lv.executeSampleLivy(sc, "chris"); ^",
"traceback": [],
}
}

I am not able to debug this error.Can you please let me know what I am doing wrong here.

Can I use Java in LivyRest API Like I have specified here.

Thanks!

Chris
  • 23
  • 7

1 Answers1

0

I'm more familiar with the batches API, but I believe in the session API your application JAR should be supplied in the files field of the request, not jars (paradoxically).

Anyway, a Livy session is basically an interactive spark-shell session. So if you wanted to use sessions, you would step through your program line-by-line (submitting a request to the RunStatement endpoint for each line). Then at the end you would ask the GetSessionStatement(s) endpoint for the result.

Alternatively (and perhaps more easily), you could use the batch API, just write the output to some persistent location, and have your web app expose it when the batch reaches "SUCCESS" state.

user4601931
  • 4,982
  • 5
  • 30
  • 42
  • Thanks for your response. – Chris Apr 01 '20 at 19:19
  • I am trying as you have suggested.I create a session first , then query on the session and create statement for that session id.While creating the statement I will need to provide a 'code' element in payload which tells Livy what to execute.This is where I am stuck.Can I provide java code here? – Chris Apr 01 '20 at 19:29
  • As per Livy Rest API documentation - "Creates a new interactive Scala, Python, or R shell in the cluster." - So this is why I have the doubt if java code would work in the interactive session – Chris Apr 01 '20 at 19:29
  • "Write the output to some persistent location" - This is an alternate approach I will thinking about. But would be easier if I can get Livy to work. Also , I am able to get the response via the programmatic api(LivyClient and Job) that Livy provides. But can you please let me know if there is anything wrong with the 'code' that is used to create the statement which I posted in my question – Chris Apr 01 '20 at 19:29
  • 1
    Scala is a superset of Java, so you should be able to import Java libraries and write Java code directly in a Scala shell with no alteration. – user4601931 Apr 01 '20 at 19:31
  • Anyway, writing the output somewhere and having your web app pick it up after the job is done is the way I'd do it (for what it's worth). You don't have to incrementally build up the code (if your application comprises 1000 lines, that's 1000 requests to the RunSessionStatement endpoint). And you can just use your JAR without having to worry about any interoperability with Scala. – user4601931 Apr 01 '20 at 19:33
  • Of course that depends on how big the output is and what's hosting your app. – user4601931 Apr 01 '20 at 19:34
  • In my experience, the sessions portion of the API is really designed for interactive exploration of a dataset. One main reason for this is that you can share Spark sessions across Livy sessions (enabling multiple users to interact with the same Spark DataFrame, and with different Spark APIs). The problem you're describing doesn't really fall into this category. – user4601931 Apr 01 '20 at 19:36
  • The code I am trying to run takes few arguments and generates a Dataset by performing operations on HDFS files and select few thousands of rows.So it will be one request only to Livy. The problem I am trying to solve by using Livy is the overhead to persist the data and read it again from my application.Do you think this is a bad scenario to use Livy? – Chris Apr 01 '20 at 19:49
  • Livy session would be created only once during my web application lifecycle and then each time the user tries with a different attributes a new statement would be created.This is my plan. – Chris Apr 01 '20 at 19:55