Questions tagged [livy]

Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface

What is Apache Livy?

Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface. It enables easy submission of Spark jobs or snippets of Spark code, synchronous or asynchronous result retrieval, as well as Spark Context management, all via a simple REST interface or a RPC client library. Apache Livy also simplifies the interaction between Spark from application servers, thus enabling the use of Spark for interactive web/mobile applications. Additional features include:

Have long running Spark Contexts that can be used for multiple Spark jobs, by multiple clients

Share cached RDDs or Dataframes across multiple jobs and clients

Multiple Spark Contexts can be managed simultaneously, and the Spark Contexts run on the cluster (YARN/Mesos) instead of the Livy Server, for good fault tolerance and concurrency

Jobs can be submitted as precompiled jars, snippets of code or via java/scala client API

Ensure security via secure authenticated communication

References

288 questions

votes

0 answers

What is the difference between livy.rsc.jars and livy.repl.jars?

I'm working on Jupyter Notebooks using sparkmagic kernel (spark-scala) which relies on Apache Livy to run spark jobs. I'm currently trying to understand the options to create sessions with user-provided dependencies i.e., jars. I know in Jupyter I…

apache-spark livy

asked Jan 22 '19 at 19:04

Ohtar10

votes

0 answers

Storing Python packages in HDFS for Livy PySpark

I am submitting PySpark jobs to the cluster through Livy. Currently the dependent python packages like NumPy, Pandas, Keras etc are installed on all the datanodes. Was wondering if all of these packages can be stored centrally in HDFS and how can…

python apache-spark pyspark livy

asked Nov 15 '18 at 18:29

danoomistmatiste

votes

2 answers

How do I run Spark jobs concurrently in the same AWS EMR cluster ?

Is it possible to submit and run Spark jobs concurrently in the same AWS EMR cluster ? If yes then could you please elaborate ?

amazon-web-services apache-spark hadoop-yarn amazon-emr livy

asked May 09 '18 at 05:13

Kunal

votes

0 answers

Bad Request: "requirement failed: Session isn't active." in Apache livy

code: public class PiApp { public static void main(String[] args) throws Exception { LivyClient client = new LivyClientBuilder().setURI(new URI("http://localhost:8998/")).build(); try { System.out.println("Uploading livy-example…

java apache livy

asked Dec 29 '17 at 09:45

Pare

votes

1 answer

Livy REST API: GET requests work but POST requests fail with '401 Authentication required'

I’ve written a Java client for parts of Livy’s REST API at https://github.com/apache/incubator-livy/blob/master/docs/rest-api.md. The client uses Spring’s RestTemplate.getForObject() and postForObject() to make GET and POST requests respectively.…

kerberos spring-rest livy

asked Oct 24 '17 at 11:13

snark

2,462
3
32
63

votes

1 answer

Why is Apache Livy session showing Application id NULL?

I've implemented a fully functional Spark 2.1.1 Standalone cluster, where I POST job batches via the curl command using Apache Livy 0.4. When consulting the Spark WEB UI I see my job along with its application id (something like this:…

apache-spark apache-spark-2.0 apache-spark-standalone livy

asked Aug 03 '17 at 16:27

Emiliano

votes

3 answers

How to kill spark/yarn job via livy

I am trying to submit spark job via livy using rest api. But if I run same script multiple time it runs multiple instance of a job with different job ID's. I am looking a way to kill spark/yarn job running with same name before starting a new one.…

rest azure azure-hdinsight livy

asked Jun 28 '17 at 15:24

roy

6,344
24
92
174

votes

1 answer

sparklyr livy connection with Kerberos

I'm able to connect to non-Kerberized spark cluster through Livy service without problems from a remote Rstudio desktop (windows). However, if the Kerberos security is enabled, the connection fails: library(sparklyr) sc <-…

r kerberos sparklyr livy

asked Jun 23 '17 at 09:24

runr

1,142
1
9
25

votes

0 answers

PySpark virtual environment archive on S3

I'm trying to deploy PySpark applications to an EMR cluster that have various, differing, third-party dependencies, and I am following this blog post, which describes a few approaches to packaging a virtual environment and distributing that across…

apache-spark pyspark amazon-emr livy

asked Jan 19 '22 at 05:28

user4601931

4,982
5
30
42

votes

1 answer

YARN doesn't recognize increased 'yarn.scheduler.maximum-allocation-mb' and 'yarn.nodemanager.resource.memory-mb' values

I'm working with a dockerized pyspark cluster which utilizes yarn. To improve the efficieny of the data processing pipelines I want to increase the amount of memory allocated to the pyspark executors and the driver. This is done by adding the…

apache-spark hadoop hive hadoop-yarn livy

asked Sep 30 '20 at 14:49

MilkSilk

votes

2 answers

How to pull Spark jobs client logs submitted using Apache Livy batches POST method using AirFlow

I am working on submitting Spark job using Apache Livy batches POST method. This HTTP request is send using AirFlow. After submitting job, I am tracking status using batch Id. I want to show driver ( client logs) logs on Air Flow logs to avoid going…

apache-spark airflow livy

asked Jan 20 '19 at 02:16

Ramdev Sharma

votes

1 answer

Spark job submission using Airflow by submitting batch POST method on Livy and tracking job

I want to use Airflow for orchestration of jobs that includes running some pig scripts, shell scripts and spark jobs. Mainly on Spark jobs, I want to use Apache Livy but not sure whether it is good idea to use or run spark-submit. What is best way…

apache-spark airflow livy

asked Jan 17 '19 at 03:36

Ramdev Sharma

votes

2 answers

Livy No YARN application is found with tag livy-batch-10-hg3po7kp in 120 seconds

Used Livy to execute a script stored in S3 via a POST request launched from EMR. The script runs but it times out very quickly. I have tried editing the livy.conf configurations, but none of the changes seem to stick. This is the error that is…

apache-spark amazon-s3 amazon-emr livy

asked Nov 28 '18 at 17:08

Aaron Liang

votes

2 answers

Java application with Apache Livy

I decided to build a web service(app) for Apache Spark with Apache Livy. Livy server is up and running on localhost port 8998 according to Livy configuration defaults. My test program is a sample application in Apache Livy documentation: …

livy

asked Sep 15 '18 at 07:46

Nima Tavassoli

votes

1 answer

Setting spark.local.dir in Pyspark/Jupyter

I'm using Pyspark from a Jupyter notebook and attempting to write a large parquet dataset to S3. I get a 'no space left on device' error. I searched around and learned that it's because /tmp is filling up. I want to now edit spark.local.dir to point…

apache-spark pyspark jupyter livy

asked Jun 29 '18 at 00:27

c3p0

Prev 1

…

19 20 Next