Questions tagged [livy]

Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface

From http://livy.incubator.apache.org.

What is Apache Livy?

Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface. It enables easy submission of Spark jobs or snippets of Spark code, synchronous or asynchronous result retrieval, as well as Spark Context management, all via a simple REST interface or a RPC client library. Apache Livy also simplifies the interaction between Spark from application servers, thus enabling the use of Spark for interactive web/mobile applications. Additional features include:

  • Have long running Spark Contexts that can be used for multiple Spark jobs, by multiple clients
  • Share cached RDDs or Dataframes across multiple jobs and clients
  • Multiple Spark Contexts can be managed simultaneously, and the Spark Contexts run on the cluster (YARN/Mesos) instead of the Livy Server, for good fault tolerance and concurrency
  • Jobs can be submitted as precompiled jars, snippets of code or via java/scala client API
  • Ensure security via secure authenticated communication

References

288 questions
2
votes
0 answers

use Apache Livy in web application (flask)

I'm building a web app which have some realtime machine learning functionality with Flask. I want to use Spark Mllib to analyze data and give me the result within the app in realtime. Then I found Livy which I thought might be suitable for my…
2
votes
1 answer

Airflow: Use LivyBatchOperator for submitting pyspark applications in yarn

I have encountered something called LivyBatchOperator but unable to find a very good example for it to submit pyspark applications in airflow. Any info on this would really be appreciated. Thanks in advance.
kavya
  • 75
  • 1
  • 10
2
votes
0 answers

Unable to reach the Spark cluster manager to request for executors

I have a Spark Standalone setup in my Docker Swarm cluster (1 manager node, 2 worker nodes). I also have a livy container that is colocated with the Spark master container in the manager node. When initializing livy sessions, the dynamic allocation…
Von Yu
  • 31
  • 4
2
votes
1 answer

How to kill a spark application gracefully

I have a process (in scala) running in a spark cluster which processes some data, uploads result and updates the state of processing. I want the upload and processing state update to be atomic operation, since the state is crucial for resuming the…
Sayantan Ghosh
  • 998
  • 2
  • 9
  • 29
2
votes
0 answers

Figure not updated on jupyter notebook (sparkmagic)

I am running the following command in a jupyter notebook with sparkmagic in order to create a figure. df.toPandas().boxplot() And in order to show the figure I run in a different cell %matplot plt The first time I run it, it works well. If I run…
Kaharon
  • 365
  • 4
  • 16
2
votes
1 answer

Python packages not importing in AWS EMR

I am trying to submit a job to EMR cluster via Livy. My Python script (to submit job) requires importing a few packages. I have installed all those packages on the master node of EMR. The main script resides on S3 which is being called by the script…
Shweta
  • 135
  • 7
2
votes
0 answers

Spark futures time out and rack local locality reasons

I'm been struggling with a issue non existent some days ago , Spark performance is very bad compared to some days ago (execution time exploded from minutes to hours , same code, same source data, same configs), by looking at logs and spark WEB UI i…
Luis Leal
  • 3,388
  • 5
  • 26
  • 49
2
votes
2 answers

How to store/get json to/from aws parameter store

I use Livy REST API to submit Spark app. { “file”: , “className”: “”, “args”: my_args, “conf”: my_conf } my_args = [args1, args2, ...] my_conf = {'foo1': 'bar1', 'foo2': 'bar2'...} I want my_conf (json secrets)…
Vinica
  • 51
  • 1
  • 2
  • 4
2
votes
1 answer

Is there a way to return job status explicity through Livy?

I need to return back "DEAD"/"FAIL" to the job "status" if the pyspark job matches a certain condiftion. Example: from pyspark.sql import SparkSession spark = SparkSession.builder\ .master("yarn")\ .appName("IDL") \ …
Parijat Bose
  • 380
  • 1
  • 6
  • 22
2
votes
4 answers

Increasing Spark application timeout in Jupyter/Livy

I'm using a shared EMR cluster with Jupyterhub installed. If my cluster is under heavy load, I get an error How do I increase the timeout for a spark application from 60 seconds to something greater like 900 seconds (15 mins)?
sho
  • 439
  • 2
  • 7
  • 13
2
votes
1 answer

Apache Livy REST API (/batches/) - How to return data back to the client

We are using Apache Livy 0.6.0-incubating and using its REST API to make calls to custom spark jar using /batches/ API. The custom spark code reads data from HDFS and does some processing. This code is successful and the REST response is also…
Mata
  • 439
  • 2
  • 10
2
votes
1 answer

How to manage the 'wait' state in Nifi ExecuteSparkINteractive processor?

I am running spark code using Nifi ExecuteSparkInteractive processor and I see it's outcome either success, failed and wait. I am able to manage and route the result perfectly fine hen it come to success and failed state but sometimes I see file is…
Ajay Ahuja
  • 1,196
  • 11
  • 26
2
votes
0 answers

Pass environment variables to a Livy/PySpark job

I have a PySpark job that I submit to Livy via Livy's HttpClient, and I would like to pass some environment variables to it. I'm currently using a workaround in which the submitted code updates its os.environ manually. Is there a better way to do…
Bolchojeet
  • 455
  • 5
  • 14
2
votes
0 answers

Connecting sparkly to HDP-Sandbox Spark instance

I would like to connect R instance to Spark running on HDP-Sandbox deployed on Docker on one local machine. The error message indicates that --version call on spark-submit fails. R instance packageVersion("sparklyr") # [1] ‘1.0.1’ # Set old JAVA…
Konrad
  • 17,740
  • 16
  • 106
  • 167
2
votes
1 answer

Run spark program using Livy as OS user

I have a kerberized cluster and want to run Spark programs as the "OS user" using Livy. Using the proxyUser option only sets the YARN user to the proxy User, the OS user is still Livy. If this is not possible then can someone point me to the Livy…
Tech Guy
  • 21
  • 2