Questions tagged [livy]

Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface

From http://livy.incubator.apache.org.

What is Apache Livy?

Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface. It enables easy submission of Spark jobs or snippets of Spark code, synchronous or asynchronous result retrieval, as well as Spark Context management, all via a simple REST interface or a RPC client library. Apache Livy also simplifies the interaction between Spark from application servers, thus enabling the use of Spark for interactive web/mobile applications. Additional features include:

  • Have long running Spark Contexts that can be used for multiple Spark jobs, by multiple clients
  • Share cached RDDs or Dataframes across multiple jobs and clients
  • Multiple Spark Contexts can be managed simultaneously, and the Spark Contexts run on the cluster (YARN/Mesos) instead of the Livy Server, for good fault tolerance and concurrency
  • Jobs can be submitted as precompiled jars, snippets of code or via java/scala client API
  • Ensure security via secure authenticated communication

References

288 questions
0
votes
0 answers

How to spark submit from Livy without using proxyUser (Kerberos)?

Right now I can submit Spark jobs over Livy with the spark submit command, and in the command there is a --proxy-user livy parameter so Livy can impersonate spark and run the spark submit. However, I want to know how to do this without having the…
JYCH
  • 61
  • 7
0
votes
0 answers

How to get the password from Kerberos principal

For all the auto-generated Kerberos principals, for example HDFS, Hadoop, Livy, how can I get their passwords so that I can try kinit with it? I created a Kerberized cluster in AWS EMR and by default it auto-generated all these principals, and now…
JYCH
  • 61
  • 7
0
votes
1 answer

Error while submitting PySpark Application through Livy REST API

I want to submit a Pyspark application to Livy through REST API to invoke HiveWarehouse Connector. Based on this answer in Cloudera…
Tinniam V. Ganesh
  • 1,979
  • 6
  • 26
  • 51
0
votes
0 answers

Submitting Spark jobs through Apache Livy to AWS EMR in a Queue

We have made jar of our spark(scala) code, uploaded that to AWS EMR through S3 we intend to run this spark code through the use of Apache Livy. After copying the jar to the cluster we run the following commands to make jar accessible to Livy: hadoop…
Saad Zia
  • 21
  • 1
  • 6
0
votes
1 answer

passing python modules on HDFS through livy

On the /user/usr1/ path in HDFS, I placed two scripts pySparkScript.py and relatedModule.py. relatedModule.py is a python module which will be imported into pySparkScript.py. I can run the scripts with spark-submit pySparkScript.py However, I need…
Paam
  • 141
  • 10
0
votes
1 answer

Does Spark 2.4.4 support forwarding Delegation Tokens when master is k8s?

I'm currently in the process of setting up a Kerberized environment for submitting Spark Jobs using Livy in Kubernetes. What I've achieved so far: Running Kerberized HDFS Cluster Livy using SPNEGO Livy submitting Jobs to k8s and spawning Spark…
denglai
  • 21
  • 6
0
votes
1 answer

AWS : EMR with multi master node setup. How to get active master node

Currently have a multi-master node setup in AWS. Livy is installed on all the 3 nodes. Is there any endpoint which can tell , which one is the currently active node, out of the three master nodes.Trying to run spark jobs via LIVY.
naval jain
  • 353
  • 3
  • 4
  • 14
0
votes
1 answer

Using logback.xml in Apache Livy

I am trying to use logback.xml with Apache Livy ReST API and I'm having trouble getting it to work. I've tried submitting the logback.xml path as follows. data = { "file" : "", "className" : "", "files":…
rasthiya
  • 650
  • 1
  • 6
  • 20
0
votes
1 answer

jupyter notebook pyspark sparkmagic error when I use inline sql magic

I have successfully configured PySpark kernel in jupyter notebook, I also installed SparkMagic. When I try to use the below command: %%sql SELECT DepDelay, ArrDelay FROM flightData it starts working and suddenly Spark stops throwing the below…
M. Wadi
  • 111
  • 1
  • 1
  • 9
0
votes
1 answer

In AWS EMR Jupyter Notebook, how to change the user from livy to hadoop

I have created a AWS EMR Cluster and uploaded, sparkify_log_small.json And created a EMR Jupyter Notebook with below code thinking it would read from user(hadoop) home directory. sparkify_log_data = "sparkify_log_small.json" df =…
0
votes
0 answers

jupyter notebook error when Starting Spark application using pyspark kernel

I've been trying to configure jupyter notebook and pyspark kernel. I am actually new to this and ubuntu os. When I tried to run some code in the jupyter notebook using pyspark kernel, I received the error log below. Note that it used to work before…
M. Wadi
  • 111
  • 1
  • 1
  • 9
0
votes
1 answer

How Livy rest API call works?

I am getting started with Apache Livy and I was able to follow online documentation and was able to submit the spark job through Curl(I have posted another question on converting curl to REST call). My plan was to tryout with curl and then convert…
Explorer
  • 1,491
  • 4
  • 26
  • 67
0
votes
1 answer

How to convert Livy curl call to Livy Rest API call

I am getting started with Livy, in my setup Livy server is running on Unix machine and I am able to do curl to it and execute the job. I have created a fat jar and uploaded it on hdfs and I am simply calling its main method from Livy. My Json…
Explorer
  • 1,491
  • 4
  • 26
  • 67
0
votes
1 answer

pyspark dataframe is giving an error with show()

I am using Zeppelin notebook with %livy.pyspark interpreter. I am running a SQL query on Hadoop Hive table and want to see few lines to the table. I am using below code: ''' %livy.pyspark from pyspark.sql import HiveContext sqlContext =…
user151444
  • 11
  • 1
0
votes
2 answers

Submitting spark Jobs over livy using curl

I'm submitting spark jobs on a livy (0.6.0) session through Curl The jobs are a big jar file that extends the Job interface just exactly like this : https://stackoverflow.com/a/49220879/8557851 Actually when running this code using this curl…