I have encountered something called LivyBatchOperator but unable to find a very good example for it to submit pyspark applications in airflow. Any info on this would really be appreciated. Thanks in advance.
Asked
Active
Viewed 2,691 times
1 Answers
1
I come across this blog post which can help you to walk through available options on Airflow + Spark.
Here is an example of LivyBatchOperator and here is on how to install airflow-livy-operators.
I would recommend below options :
- AWS EMR : Use EmrAddStepsOperator
- Regular Spark Cluster : Use above mechanism to set up Livy operators in airflow. This will give you a slick configuration from the airflow servers perspective as well as using Livy in front of spark cluster.
Let me know your response !

Abdul
- 126
- 3
-
Thanks, the respective blogs helped me to start with. Can we pass a zip file in **file** parameter and a **class_name** in submitting pyspark applications through livy? – kavya Jul 01 '20 at 17:43
-
Yes there is an option to pass ZIP files using files argument not using file. files - Used to send list of ZIP files file - In case of python , use this as entry point to run the spark driver class_name - This will be class name for Java/Spark main class. Refer here for Livy API documentations which is a back bone of this LivyBatchOperator. https://livy.incubator.apache.org/docs/latest/rest-api.html – Abdul Jul 01 '20 at 20:02
-
I am getting issues when I tried this `LivyBatchOperator( task_id = 'spark_job', file = '/abc/xyz.zip', class_name = 'src.foo.py', py-files), "spark.submit.pyFiles":'/abc/lmn.zip' where src.foo.py is a file in xyz.zip` `Error: --py-files given but primary resource is not a Python script`. @Abdul – kavya Jul 02 '20 at 12:01