0

For LivyOperator we set the following parameters:

polling_interval=60
retries_num_timeout=100

We set it up according to this documentation: https://airflow.apache.org/docs/apache-airflow-providers-apache-livy/stable/_api/airflow/providers/apache/livy/operators/livy/index.html

But, in this configuration after 100 * 60 seconds = 6000 seconds = 1 hour 40 minutes Livy-session is interrupted, operator becomes failed, loading is interrupted. Is there any way to resove such inconsistency on Airflow/Livy side?

Павел Иванов
  • 1,863
  • 5
  • 28
  • 51
  • What do you expect exactly from providing these configurations? you tell Airflow to check if the job is terminated every 60s, and to fail after 100 retry, is that what you want or not? – Hussein Awala Feb 19 '23 at 22:23
  • Well, from our observations: if many small tables are loaded - it makes sense to decrease polling_interval, since they are loaded rather quickly, and the faster we learned that its loading is completed - the faster we began to execute the next operator; For large tables, on the contrary, it makes sense to increase retries_num_timeout and polling_interval. On very large tables we observe the behaviour described above. – Павел Иванов Feb 20 '23 at 06:30

0 Answers0