0

I am running Airflow in a separate virtual env and running a couple of data quality DAGs with specific requirements. Effectively I want to run the DAGs in their own virtual envs rather than cluttering the base Airflow environment.

pythonVirtualenvOperator does somewhat similar but it creates its own environments every time and later removes them. A DAG that is run a couple of times a day is not efficient in time or space-wise.

I couldn't find a way to run the DAGs in a separate virtual environment in the same Airflow installation. Is there any way to do it

2 Answers2

1

What I do now is something like in this answer.

I basically have different conda-environments and explicitly call them using the Bash-Operator. Here is an example:

parse_my_files = BashOperator(
            task_id='parse-files',
            bash_command=f"{path_to_python} {abs_path_code}/my_repository/scripts/"
                         f"report_processing/pipelines/parse.py",
            env={"PATH": os.environ["PATH"],
                 "DB_URL": db_url}
        )

To install current packages, you need to activate the environment and run your package dependency resolver. For us, this is done using poetry

install_dependencies = BashOperator(
                    task_id=f"install-dependencies-{folder}",
                    bash_command=f"cd {abs_path_code}/{folder}; conda run -n {env_name} poetry install "
                )

It would be nice to have a python-operator that takes an environment and reuses it every time, but as I understand this is not on their to do list.

Hans Bambel
  • 806
  • 1
  • 10
  • 20
  • thanks for the suggestion. I am actually currently running this way, explicitly activating the environment and then running the DAG. As a stop-gap, this solves the issue but yes it would be good even if the pythonVirtualenvOperator could be run without creating new env – Saradindu Sengupta Mar 31 '22 at 17:25
0

@Saradindu Sengupta

Have you considered utilising the DockerOperator()? Official Airflow Docker Reference

You could build an image with your specific requirements and execute via the DockerOperator.

dimButTries
  • 661
  • 7
  • 15
  • yes and that would be an ideal production-level scenario but I wanted to run DAGs without the need for any containerization at this moment to make the dev process easier and faster – Saradindu Sengupta Mar 31 '22 at 17:27