Questions tagged [livy]

Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface

What is Apache Livy?

Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface. It enables easy submission of Spark jobs or snippets of Spark code, synchronous or asynchronous result retrieval, as well as Spark Context management, all via a simple REST interface or a RPC client library. Apache Livy also simplifies the interaction between Spark from application servers, thus enabling the use of Spark for interactive web/mobile applications. Additional features include:

Have long running Spark Contexts that can be used for multiple Spark jobs, by multiple clients

Share cached RDDs or Dataframes across multiple jobs and clients

Multiple Spark Contexts can be managed simultaneously, and the Spark Contexts run on the cluster (YARN/Mesos) instead of the Livy Server, for good fault tolerance and concurrency

Jobs can be submitted as precompiled jars, snippets of code or via java/scala client API

Ensure security via secure authenticated communication

References

288 questions

votes

0 answers

How to spark submit from Livy without using proxyUser (Kerberos)?

Right now I can submit Spark jobs over Livy with the spark submit command, and in the command there is a --proxy-user livy parameter so Livy can impersonate spark and run the spark submit. However, I want to know how to do this without having the…

apache-spark kerberos livy

asked Feb 06 '20 at 01:40

JYCH

votes

0 answers

How to get the password from Kerberos principal

For all the auto-generated Kerberos principals, for example HDFS, Hadoop, Livy, how can I get their passwords so that I can try kinit with it? I created a Kerberized cluster in AWS EMR and by default it auto-generated all these principals, and now…

kerberos livy

asked Jan 23 '20 at 21:25

JYCH

votes

1 answer

Error while submitting PySpark Application through Livy REST API

I want to submit a Pyspark application to Livy through REST API to invoke HiveWarehouse Connector. Based on this answer in Cloudera…

rest pyspark apache-nifi livy

asked Jan 11 '20 at 04:46

Tinniam V. Ganesh

1,979
6
26
51

votes

0 answers

Submitting Spark jobs through Apache Livy to AWS EMR in a Queue

We have made jar of our spark(scala) code, uploaded that to AWS EMR through S3 we intend to run this spark code through the use of Apache Livy. After copying the jar to the cluster we run the following commands to make jar accessible to Livy: hadoop…

amazon-web-services apache-spark amazon-emr livy

asked Dec 18 '19 at 04:36

Saad Zia

votes

1 answer

passing python modules on HDFS through livy

On the /user/usr1/ path in HDFS, I placed two scripts pySparkScript.py and relatedModule.py. relatedModule.py is a python module which will be imported into pySparkScript.py. I can run the scripts with spark-submit pySparkScript.py However, I need…

pyspark hdfs python-module livy

asked Dec 10 '19 at 12:34

Paam

votes

1 answer

Does Spark 2.4.4 support forwarding Delegation Tokens when master is k8s?

I'm currently in the process of setting up a Kerberized environment for submitting Spark Jobs using Livy in Kubernetes. What I've achieved so far: Running Kerberized HDFS Cluster Livy using SPNEGO Livy submitting Jobs to k8s and spawning Spark…

apache-spark kubernetes hdfs kerberos livy

asked Dec 04 '19 at 11:57

denglai

votes

1 answer

AWS : EMR with multi master node setup. How to get active master node

Currently have a multi-master node setup in AWS. Livy is installed on all the 3 nodes. Is there any endpoint which can tell , which one is the currently active node, out of the three master nodes.Trying to run spark jobs via LIVY.

apache-spark amazon-emr livy

asked Nov 25 '19 at 14:55

naval jain

votes

1 answer

Using logback.xml in Apache Livy

I am trying to use logback.xml with Apache Livy ReST API and I'm having trouble getting it to work. I've tried submitting the logback.xml path as follows. data = { "file" : "", "className" : "", "files":…

apache-spark logback livy

asked Nov 19 '19 at 14:56

rasthiya

votes

1 answer

jupyter notebook pyspark sparkmagic error when I use inline sql magic

I have successfully configured PySpark kernel in jupyter notebook, I also installed SparkMagic. When I try to use the below command: %%sql SELECT DepDelay, ArrDelay FROM flightData it starts working and suddenly Spark stops throwing the below…

python apache-spark pyspark jupyter-notebook livy

asked Oct 24 '19 at 14:29

M. Wadi

votes

1 answer

In AWS EMR Jupyter Notebook, how to change the user from livy to hadoop

I have created a AWS EMR Cluster and uploaded, sparkify_log_small.json And created a EMR Jupyter Notebook with below code thinking it would read from user(hadoop) home directory. sparkify_log_data = "sparkify_log_small.json" df =…

amazon-web-services jupyter-notebook hadoop-yarn amazon-emr livy

asked Oct 23 '19 at 15:48

bobby.dreamer

votes

0 answers

jupyter notebook error when Starting Spark application using pyspark kernel

I've been trying to configure jupyter notebook and pyspark kernel. I am actually new to this and ubuntu os. When I tried to run some code in the jupyter notebook using pyspark kernel, I received the error log below. Note that it used to work before…

apache-spark ubuntu pyspark jupyter-notebook livy

asked Oct 12 '19 at 14:01

M. Wadi

votes

1 answer

How Livy rest API call works?

I am getting started with Apache Livy and I was able to follow online documentation and was able to submit the spark job through Curl(I have posted another question on converting curl to REST call). My plan was to tryout with curl and then convert…

apache-spark livy

asked Oct 01 '19 at 23:01

Explorer

1,491
4
26
67

votes

1 answer

How to convert Livy curl call to Livy Rest API call

I am getting started with Livy, in my setup Livy server is running on Unix machine and I am able to do curl to it and execute the job. I have created a fat jar and uploaded it on hdfs and I am simply calling its main method from Livy. My Json…

scala apache-spark livy

asked Oct 01 '19 at 19:31

Explorer

1,491
4
26
67

votes

1 answer

pyspark dataframe is giving an error with show()

I am using Zeppelin notebook with %livy.pyspark interpreter. I am running a SQL query on Hadoop Hive table and want to see few lines to the table. I am using below code: ''' %livy.pyspark from pyspark.sql import HiveContext sqlContext =…

apache-spark hive pyspark apache-zeppelin livy

asked Sep 25 '19 at 07:14

user151444

votes

2 answers

Submitting spark Jobs over livy using curl

I'm submitting spark jobs on a livy (0.6.0) session through Curl The jobs are a big jar file that extends the Job interface just exactly like this : https://stackoverflow.com/a/49220879/8557851 Actually when running this code using this curl…

scala apache-spark livy

asked Sep 02 '19 at 09:40

Ahmed Adnane A'mil

Prev 1 2 3

…

19 20 Next