0

I have two separate spark file. extension(.py) I want to schedule both files using airflow but both need to start at the same time. How can I do that??

However, When I am manually submitting a spark file using two different threads then it is running parallelly.

So I just want to schedule both the file at the same time. How can I do that using Airflow?

Jaydip Dey
  • 11
  • 3
  • Well in airflow you can use either a `BashOperator` or even a `PythonOperator` to actually run the scripts individually. To set the dependencies, look [here](https://airflow.apache.org/docs/stable/concepts.html#bitshift-composition). Basically, if you want `task_1` to always run before `task_2`, you can write `task_1 >> task_`. – manesioz Jan 23 '20 at 14:37
  • It is totally clear that I can set dependencies. But for now, I have two python files and want them to execute parallelly instead of sequential execution. – Jaydip Dey Jan 24 '20 at 06:10
  • Does this answer your question? [Running airflow tasks/dags in parallel](https://stackoverflow.com/questions/52741536/running-airflow-tasks-dags-in-parallel) – UJIN Jan 24 '20 at 10:48
  • Well let's say you want to run `t1`, then `t3` and `t2` in parallel, and finally `t4`. At the bottom of your DAG where you set dependencies you can write `t1 >> [t2, t3] >> t4` – manesioz Jan 24 '20 at 11:34

0 Answers0