I need to submit a PySpark job to Airflow through LivyOperator. I see there are arguments to the LivyOperator's init method where users can pass in a list of Python files, but is there a way to do this more cleanly? For example, what if I would like install some 3rd library? Is there a way I can setup a virtual environment? Thanks.
Asked
Active
Viewed 110 times
1 Answers
0
To run job on Databricks you need to use Databricks-specific operators. Specifically, look onto DatabricksSubmitRunOperator. This operator allows to specify tasks to execute, together with libraries that are required for these tasks.
P.S. it's really not enough information to put more detailed answer...

Alex Ott
- 80,552
- 8
- 87
- 132