0

I am using Airflow2.0's taskflow API to generate DAGs in order to orchestrate ETL jobs.

Airflow2.0 doesn't seem to provide a framework to generate DAGs according to the DRY principle. Basically each DAG needs to be generated in a separate file and there is a lot of copy/paste involved. A DAG's structure is always the same in the ETL context (only config parameters change based on the specific use case). A standard DAG as generated by Airflow 2.0's taskflow API is defined as follows:

@dag(schedule_interval='30 12 * * *', default_args=default_args, catchup=False)
def test_dag():
    @task
    def _extract():
      pass

    @task
    def _transform():
      pass

    @task
    def _load():
      pass 

Airflow2.0 is not using classes or an OOP approach to generate these Dags. But I thought since the structure of each DAG is the the same in the ETL context and has to always contain the same functions "extract", "transform" and "load" it would be a good idea to use an OOP approach and setup an abstract base class defining "extract", "transform" and "load" as abstractmethods, thereby assuring that these are contained in each newly generated DAG.

Therefore I am trying to build a class interface using the abstract factory pattern based on python's ABC (Abstract Base Classes) library.

This is generally working fine for abstractmethods that are not nested, e.g.:

from abc import ABC, abstractmethod 

class AbstractDag(ABC):
    def __init__(self):
        pass
        
    @abstractmethod
    def test_dag(self):
        pass

When someone tries to create a class which inherits from AbstractDag and hasn't defined the abstractmethod "test_dag" an error is thrown, as expected since "test_dag" needs to be implemented as specified in AbstractDag. So all good here.

But when I try the same approach with nested methods as are required for the Airflow DAG it is not working. No check is run, even if the nested method is missing no error is thrown.

from abc import ABC, abstractmethod   

class AbstractDag(ABC):
    def __init__(self):
        pass
        
    @abstractmethod
    def test_dag(self):
        @abstractmethod
        def _extract(self):
            pass
        @abstractmethod
        def _transform(self):
            pass
        @abstractmethod
        def _load(self):
            pass
        pass

Even if someone creates a new class which inherits from the AbstractDag class and has not specified the nested methods _extract, _transform or _load, the class can be generated, which shouldn't be possible.

I cannot go around nesting the methods since the Airflow2.0 taskflow API requires a main DAG function and all actual tasks to be nested functions within this main DAG function.

So my question is how can I build an interface using the abstractmethod decorator for nested methods as well?

Or is there a better approach to ensure each DAG contains these methods using a different approach?

omoshiro
  • 3
  • 3
  • You cannot. A methods local scope does not work in the same manner as the class scope when it comes to `abstractmethod`. Your best bet is to move the `nested_test_method` to either another file (say `utils.py`), or into the scope of the class. – felipe Dec 28 '21 at 14:11
  • Anything happening inside `test_method` is its private implementation detail, none of which can even be accessed from outside. What would you even do with a "nested method"? – deceze Dec 28 '21 at 14:12
  • 1
    It's worth noting that that the intentions of `ABC` and `abstractmethods` is to give the subsequent developers - who are implementing the abstract class - a consistent "base" to develop on top. Perhaps a nested method makes sense in your head, but someone else might see fit to develop the `test_method` some other way. In other words, there is no point creating nested abstract methods. – felipe Dec 28 '21 at 14:13
  • Thanks for your comments. I added more details to the original question, explaining "why" I need to use nested methods. The API I am using Airflow2.0 taskflow API requires nested methods. Maybe there is a better approach to achieve the same goal? – omoshiro Dec 28 '21 at 20:10
  • ABCs have zero to do with prescribing the inner structure of a function, but the question is laid out well enough that perhaps some other appropriate advice can be given. – deceze Dec 28 '21 at 20:36
  • Thanks for clarifying @deceze. It doesn't need to be an ABC, just was my first guess, since it seemed to be closely related. So in case anyone else has an idea, based on the specific use case as laid out in the question, alternative approaches and suggestions are also highly appreciated. – omoshiro Dec 29 '21 at 10:19

0 Answers0