I am using Airflow2.0's taskflow API to generate DAGs in order to orchestrate ETL jobs.
Airflow2.0 doesn't seem to provide a framework to generate DAGs according to the DRY principle. Basically each DAG needs to be generated in a separate file and there is a lot of copy/paste involved. A DAG's structure is always the same in the ETL context (only config parameters change based on the specific use case). A standard DAG as generated by Airflow 2.0's taskflow API is defined as follows:
@dag(schedule_interval='30 12 * * *', default_args=default_args, catchup=False)
def test_dag():
@task
def _extract():
pass
@task
def _transform():
pass
@task
def _load():
pass
Airflow2.0 is not using classes or an OOP approach to generate these Dags. But I thought since the structure of each DAG is the the same in the ETL context and has to always contain the same functions "extract", "transform" and "load" it would be a good idea to use an OOP approach and setup an abstract base class defining "extract", "transform" and "load" as abstractmethods, thereby assuring that these are contained in each newly generated DAG.
Therefore I am trying to build a class interface using the abstract factory pattern based on python's ABC (Abstract Base Classes) library.
This is generally working fine for abstractmethods that are not nested, e.g.:
from abc import ABC, abstractmethod
class AbstractDag(ABC):
def __init__(self):
pass
@abstractmethod
def test_dag(self):
pass
When someone tries to create a class which inherits from AbstractDag and hasn't defined the abstractmethod "test_dag" an error is thrown, as expected since "test_dag" needs to be implemented as specified in AbstractDag. So all good here.
But when I try the same approach with nested methods as are required for the Airflow DAG it is not working. No check is run, even if the nested method is missing no error is thrown.
from abc import ABC, abstractmethod
class AbstractDag(ABC):
def __init__(self):
pass
@abstractmethod
def test_dag(self):
@abstractmethod
def _extract(self):
pass
@abstractmethod
def _transform(self):
pass
@abstractmethod
def _load(self):
pass
pass
Even if someone creates a new class which inherits from the AbstractDag class and has not specified the nested methods _extract, _transform or _load, the class can be generated, which shouldn't be possible.
I cannot go around nesting the methods since the Airflow2.0 taskflow API requires a main DAG function and all actual tasks to be nested functions within this main DAG function.
So my question is how can I build an interface using the abstractmethod decorator for nested methods as well?
Or is there a better approach to ensure each DAG contains these methods using a different approach?