Questions tagged [livy]

Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface

From http://livy.incubator.apache.org.

What is Apache Livy?

Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface. It enables easy submission of Spark jobs or snippets of Spark code, synchronous or asynchronous result retrieval, as well as Spark Context management, all via a simple REST interface or a RPC client library. Apache Livy also simplifies the interaction between Spark from application servers, thus enabling the use of Spark for interactive web/mobile applications. Additional features include:

  • Have long running Spark Contexts that can be used for multiple Spark jobs, by multiple clients
  • Share cached RDDs or Dataframes across multiple jobs and clients
  • Multiple Spark Contexts can be managed simultaneously, and the Spark Contexts run on the cluster (YARN/Mesos) instead of the Livy Server, for good fault tolerance and concurrency
  • Jobs can be submitted as precompiled jars, snippets of code or via java/scala client API
  • Ensure security via secure authenticated communication

References

288 questions
0
votes
0 answers

{"msg":"requirement failed: Cannot find Livy REPL jars."}

curl -XPOST 'http://node1:8998/sessions' -H "Content-Type:application/json" --data '{"kind":"spark"}' A fully distributed cluster of three nodes. spark3.0.0. livy.conf: livy.spark.master = yarn livy.spark.deployMode = cluster livy.environment =…
GGBond
  • 1
0
votes
0 answers

Hue Pyspark connector using Livy - Increate spark driver memory for interactive sessions

We are using CDP private cloud 7.1.7 and have configured Hue connector for pyspark using livy. By default I can see the driver launches with 1GB memory and I need to increase this as some of the code we are running is failing due to OOM errors for…
0
votes
1 answer

org.apache.hadoop.yarn.api.ApplicationClientProtocolPB is unauthorized Error?

I have kerberized Hadoop I set this in core-site.xml hadoop.proxyuser.hue.hosts * hadoop.proxyuser.hue.groups * And i ve…
CompEng
  • 7,161
  • 16
  • 68
  • 122
0
votes
0 answers

EOL/Obsolete Software: Apache Log4j 1.X Detected

currently we are having Apache Log4j 1.X on one of our docker image which is reported as EOL and needs to update to Apache Log4j 2.X version but we are unable to find that package to upgrade it to Log4j 1.X. reported package log4j-1.2.16.jar needed…
0
votes
0 answers

What's the path of external file after submitting the spark submit using Livy?

I'm submitting the spark job using Livy batch api as shown below. Here I'm passing the .p12 as the files params which will be used later in the application for ssl communication. { "className":"com.StreamingMain", …
coders
  • 719
  • 1
  • 11
  • 35
0
votes
1 answer

How to pass Spark executor parameters to DBT?

I am using DBT with spark-livy-adapter. And I want to pass spark-conf properties like --num_executors via DBT-cli command: like this dbt run --select my_model --spark_conf "num-executors":5 Please me help me on this
Gabber
  • 7,169
  • 3
  • 32
  • 46
0
votes
0 answers

EMR Function logs not emitted

EMR function logs are not emitted in hadoop-yarn folder We are using Livy for submitting jobs to EMR I have added following configuration for logs classification: 'spark-log4j', configurationProperties: { "log4j.rootCategory":…
nimesh
  • 1
0
votes
0 answers

LivyClient is not able to upload jar with dependency

Below is my code. LivyClient is not able to upload jar with dependency and it throws exception java.util.concurrent.ExecutionException. Please help to resolve this issue. public class LivyConfig { @Bean public LivyClient client() throws…
0
votes
0 answers

Problems configuring LivyOperator in Airflow

For LivyOperator we set the following parameters: polling_interval=60 retries_num_timeout=100 We set it up according to this documentation:…
Павел Иванов
  • 1,863
  • 5
  • 28
  • 51
0
votes
0 answers

Sparkmagic errors out using iPython 7.33.0

I am attempting to connect to an Amazon EMR cluster using Livy 0.7 and Spark from an Amazon Sagemaker Notebook running Amazon Linux 2. Can anyone help me understand this error and how I might go about fixing it? When I go to run the following…
0
votes
0 answers

Apache Livy docker container error "No log line matching the '' filter

I made Apache Livy build from official site (https://github.com/apache/incubator-livy)after changing spark version 3.x. Build is successful. After that making docker image for running Livy container in K8 pod. After necessary change in docker , able…
cns kumar
  • 7
  • 3
0
votes
0 answers

The session is always in the starting state when there is a problem when livy executes the spark submit

I deliberately wrote an incorrect jar package path, and then an error occurred that the session was always in the starting state
study
  • 21
  • 3
0
votes
0 answers

How to access ADLS gen 2 file from apache livyoperator in Airflow

I am trying to execute a python file from LiveOperator of Airflow which resides in ADLS gen 2 location . livy_python_task = LivyOperator(task_id='pi_python_task',livy_conn_id='livy_default',…
Pranav
  • 57
  • 8
0
votes
1 answer

how to switch Livy job log-level into DEBUG mode?

Like in spark we can add -Dlog4j.debug config , do we have any such equivalent in Airflow Livy operator. I have already browsed…
0
votes
0 answers

Pyspark and spark not working in apache hue

I want to know the cause. os : ubuntu 20.4 heu version : 4.10.0 livy version : 0.8.0 spark version : 3.3.0 hadoop version : 3.3.4 hive version : 3.1.3 After installing livy to use pyspark, I checked the operation of pyspark using curl. And in apache…
jsk
  • 3
  • 3