Questions tagged [livy]

Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface

From http://livy.incubator.apache.org.

What is Apache Livy?

Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface. It enables easy submission of Spark jobs or snippets of Spark code, synchronous or asynchronous result retrieval, as well as Spark Context management, all via a simple REST interface or a RPC client library. Apache Livy also simplifies the interaction between Spark from application servers, thus enabling the use of Spark for interactive web/mobile applications. Additional features include:

  • Have long running Spark Contexts that can be used for multiple Spark jobs, by multiple clients
  • Share cached RDDs or Dataframes across multiple jobs and clients
  • Multiple Spark Contexts can be managed simultaneously, and the Spark Contexts run on the cluster (YARN/Mesos) instead of the Livy Server, for good fault tolerance and concurrency
  • Jobs can be submitted as precompiled jars, snippets of code or via java/scala client API
  • Ensure security via secure authenticated communication

References

288 questions
3
votes
0 answers

Livy is failing when a job is submitted in yarn cluster mode

I have a livy server running in my local and it is connecting to a remote yarn cluster for running spark jobs. I am getting the below error when i upload my jar from Livy programmatic Job api. It looks like correct netty channel handler is not…
3
votes
2 answers

curl: how to use Kerberos instead of NTLM authentication on Windows?

I'm trying to connect to a Livy REST service under Kerberos security. On Linux CentoS curl works fine with negotiate, after receiving a Kerberos kinit ticket the connection through curl --negotiate -u : http://service_link The problem I'm facing is…
runr
  • 1,142
  • 1
  • 9
  • 25
3
votes
1 answer

display the full column size in zeppelin %sql

I simply want to display the column without truncate into a select where i have an array or a Map with a very big lenth. I use zeppelin to query a df register as temp table: %livy.sql select * from maTable I would like to have the full length of…
a.moussa
  • 2,977
  • 7
  • 34
  • 56
3
votes
1 answer

Upload Python script using Livy

I am trying to find a way to push a python script using Livy API (or client) on the spark server. I have tried the following. curl -X POST --data '{"file": "/user/test/pi.py"}' -H "Content-Type: application/json" localhost:8998/batches ,…
shubham
  • 547
  • 2
  • 6
  • 20
2
votes
1 answer

How to setup virtual env or install a python library when I try to submit a PySpark job to Databricks from Airflow?

I need to submit a PySpark job to Airflow through LivyOperator. I see there are arguments to the LivyOperator's init method where users can pass in a list of Python files, but is there a way to do this more cleanly? For example, what if I would like…
WZH
  • 354
  • 2
  • 12
2
votes
1 answer

Azure Synapse Spark LIVY_JOB_STATE_ERROR

i'm experimenting the following error when executing any cell in my notebook: LIVY_JOB_STATE_ERROR: Livy session has failed. Session state: Killed. Error code: LIVY_JOB_STATE_ERROR. [(my.synapse.spark.pool.name) WorkspaceType: CCID:<(hexcode)>]…
frammnm
  • 537
  • 1
  • 5
  • 17
2
votes
0 answers

Start PySpark in Jupyter notebook on EMR 6.5

I am trying to start a pyspark job using Amazon EMR Jupyter hub feature, as follow: And with following code: from pyspark import SparkSession spark = SparkSession \ .builder \ .appName("My App") \ .getOrCreate() But at the end, I…
2
votes
1 answer

SparkMagic PySpark3 session with Livy on Cloudera

I am trying to run Jupyterhub pyspark kernel session with python3 witn Livy running on Cloudera cluster. The spark session ends without any meaningful error, Livy logs have the following: 21/07/21 12:28:09 INFO yarn.Client: Submitting application…
Artyom Rebrov
  • 651
  • 6
  • 23
2
votes
1 answer

How to include BigQuery Connector inside Dataproc using Livy

I'm trying to run my application using Livy that resides inside GCP Dataproc but I'm getting this: "Caused by: java.lang.ClassNotFoundException: bigquery.DefaultSource" I'm able to run hadoop fs -ls gs://xxxx inside Dataproc and I checked if Spark…
Celso Marques
  • 378
  • 1
  • 4
  • 15
2
votes
1 answer

AWS MWAA - connection timed out when try do rest call to Livy

I created MWAA using public network option (version 2.0.2). Created a sample of airflow dag in which start emr with next properties: JOB_FLOW_OVERRIDES = { 'Name': 'demo-cluster-airflow', 'ReleaseLabel': 'emr-6.2.0', 'LogUri':…
Grish
  • 93
  • 6
2
votes
1 answer

Apache Livy 0.7.0 Failed to create Interactive session

While creating a new session using apache Livy 0.7.0 I am getting below error. I am also using zeppelin notebook(livy interpreter) to create the session. Using Scala version 2.12.10, Java HotSpot(TM) 64-Bit Server VM, 11.0.11 Spark 3.0.2 zeppelin…
Sushil Behera
  • 817
  • 6
  • 20
2
votes
1 answer

Jupyterhub pyspark3 on AWS EMR YARN Cluster

I'm running Jupyterhub with pyspark3 kernel on AWS EMR Cluster. As we might know Jupyterhub pyspark3 on EMR uses Livy session to run workloads on AWS EMR YARN scheduler. My question is about the configuration of spark: executor memory/cores, driver…
2
votes
1 answer

Change Python version Livy uses in an EMR cluster

I am aware of Change Apache Livy's Python Version and How do i setup Pyspark in Python 3 with spark-env.sh.template. I also have seen the Livy documentation However, none of that works. Livy keeps using Python 2.7 no matter what. This is running…
Clay
  • 2,584
  • 1
  • 28
  • 63
2
votes
0 answers

How to submit Spark job through containerized Livy

I'm using the following repo to run Spark (2.4.7) and Livy (0.7). The curl commands shown on the repo works fine, and it seems that everything is up and running. I wrote a simple word counting maven Spark Java program and used Livy client to submit…
Oded
  • 336
  • 1
  • 3
  • 17
2
votes
0 answers

Submit spark job using SparkMagic + Livy with local jar file?

Does anyone know how to submit a spark job to Livy, via SparkMagic, but with local jar file as dependency? I looked around (such as here: https://livy.apache.org/docs/latest/rest-api.html), looks like Livy must require the jar file to be in hdfs…
kcode2019
  • 119
  • 1
  • 7