Questions tagged [livy]

Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface

From http://livy.incubator.apache.org.

What is Apache Livy?

Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface. It enables easy submission of Spark jobs or snippets of Spark code, synchronous or asynchronous result retrieval, as well as Spark Context management, all via a simple REST interface or a RPC client library. Apache Livy also simplifies the interaction between Spark from application servers, thus enabling the use of Spark for interactive web/mobile applications. Additional features include:

  • Have long running Spark Contexts that can be used for multiple Spark jobs, by multiple clients
  • Share cached RDDs or Dataframes across multiple jobs and clients
  • Multiple Spark Contexts can be managed simultaneously, and the Spark Contexts run on the cluster (YARN/Mesos) instead of the Livy Server, for good fault tolerance and concurrency
  • Jobs can be submitted as precompiled jars, snippets of code or via java/scala client API
  • Ensure security via secure authenticated communication

References

288 questions
2
votes
1 answer

Error when running Spark by livy

I am running my Spark job by using livy , however , I get below exception java.util.concurrent.ExecutionException: java.io.IOException: Internal Server Error: "java.util.concurrent.ExecutionException: org.apache.livy.rsc.rpc.RpcException:…
Luckylukee
  • 575
  • 2
  • 9
  • 27
2
votes
0 answers

Best Practice to Generate PySpark Statements Using C#?

I am writing a ASP.Net Web API that at some point will communicate with an Apache Spark cluster. The communication is established using Livy server on the spark cluster that exposes a REST API interface and an HTTP client i wrote. In my business…
Anis Tissaoui
  • 834
  • 1
  • 7
  • 26
2
votes
2 answers

Zeppelin 0.7.2 version does not support spark 2.2.0

How to downgrade the spark version? What could be the other solutions? I have to connect my hive tables to spark using spark session. But the spark version is not supported by zeppelin.
SHWETA WIN
  • 41
  • 1
  • 7
2
votes
1 answer

livy server submits jar everytime a batch job is submitted

While submitting a Apache Spark batch job using Livy server,it uploads the jar file(containing application) everytime i.e., for every batch job submission.This seems to increase the job submission time.Is there a way to refer the jar present in the…
Parithi
  • 51
  • 3
2
votes
3 answers

Livy pyspark Python Session Error in Jypyter with Spark Magic - ERROR repl.PythonInterpreter: Process has died with 1

I'm running a spark v2.0.0 YARN cluster. I have livy running beside the Spark master. I have set up a jupyter Python3 notetebook and have Spark Magic installed and have followed the nessesary instructions to connect Spark Magic to Livy although When…
mildog8
  • 2,030
  • 2
  • 22
  • 36
2
votes
1 answer

Access a data file from the current livy session

I have a Spark Cluster running on Hadoop in YARN mode. I have configured Livy server to interact and submit client spark jobs to the spark cluster. I uploaded a data file along with the jar from the java program to Livy which gets uploaded in the…
msingh
  • 399
  • 2
  • 15
2
votes
2 answers

execute Spark jobs, with Livy, using `--master yarn-cluster` without making systemwide changes

I'd like to execute a Spark job, via an HTTP call from outside the cluster using Livy, where the Spark jar already exists in HDFS. I'm able to spark-submit the job from shell on the cluster nodes, e.g.: spark-submit --class io.woolford.Main --master…
Alex Woolford
  • 4,433
  • 11
  • 47
  • 80
1
vote
1 answer

How to mock connection for airflow's Livy Operator using unittest.mock

@mock.patch.dict( "os.environ", AIRFLOW_CONN_LIVY_HOOK = "http://www.google.com", clear= True ) class TestLivyOperator(unittest.TestCase): def setUp(self): super().setUp() self.dag = DAG( dag_id =…
1
vote
1 answer

Pyspark(via sparkmagic + livy) : There is insufficient memory for the Java Runtime Environment to continue

I'm using Sagemaker connecting to an EMR cluster via sparkmagic and livy, very frequently I get(at session startup, not running any code): > The code failed because of a fatal error: Session unexpectedly > reached final status 'dead'. See…
Luis Leal
  • 3,388
  • 5
  • 26
  • 49
1
vote
1 answer

Can't init session in Spark. How to debug "User capacity has reached its maximum limit."?

I'm trying to create a session in Apache Spark using the Livy rest API. It fails with the following error: User capacity has reached its maximum limit.. The user is running another spark job. I don't understand which capacity reached its maximum and…
neves
  • 33,186
  • 27
  • 159
  • 192
1
vote
0 answers

How to get the state of a remote job in Livy using Java API

Is it possible to monitor the state of an already running remote job in Livy with Java API? How can this be done? I looked over Livy Java API docs. A JobHandle would let me pool the state of the app. However, the only way I can see to obtain it is…
oskarryn
  • 170
  • 2
  • 13
1
vote
0 answers

How can we connect to remote spark cluster via jupyterhub?

First of all my agenda is to be able to use spark codes inside jupyterhub. In other words I want to connect a remote spark cluster to jupyterhub. After searching about it I came up with two solutions:1)Livy and 2)spark magic. I have tried Livy but…
1
vote
1 answer

Spark requests for more core than asked when calling POST livy batch api in azure synapse

I have an azure synapse spark cluster with 3 nodes of 4 vCores and 32 GB memory each. I am trying to submit a spark job using azure synapse Livy batch APIs. The request looks like this, curl --location --request POST…
1
vote
1 answer

Databricks Notebook as Substitute for livy sessions endpoint

I want to execute a Databrikcs Notebook's code via Databricks API and get the output of notebook's code as response. Is it possible of is there any workaround for the same ? Is the same possible with Databricks SQL api ?
1
vote
0 answers

PySpark batch job's configuration submitted through Apache Livy have no effect

I submitted spark batch job through Livy to the remote cluster with the following request body. REQUEST_BODY = { 'file': '/spark/batch/job.py', 'conf': { 'spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation': 'true', …