3

I'm working on an Apache Airflow, container based application. My environment is made of the following components:

  • Airflow Scheduler container
  • Airflow Webserver container
  • Airflow Celery Flower container
  • Airflow Worker container (1)
  • etc.

My understanding of this pattern is that I can have a scheduler and a webserver containers with just the necessary dependencies for Airflow, then I can have a worker node (or several) with everything I need to run my DAG.

When I try to work with it this way (for instance, adding and using a module in the worker node, let's say it's the crypto module), I get a DAG Import Error exception in the front end, that says the following: ModuleNotFoundError: No module named 'crypto'.

This makes sense to me, because the scheduler knows that I'll need that module for the execution and throws an error, despite this the DAG correctly work, because when it's run, in the worker node, it has all the required dependencies.

How can I fix this?

Thanks

Marco Miduri
  • 123
  • 1
  • 8

2 Answers2

2

Currently, you will need to sync your dependencies on both Scheduler and Worker.

The scheduler parses DAG Files in a separate process (one per DAG file), so if your dependencies used in DAG file are not installed in Scheduler it will add an ImportError in DB which will be then shown in the Webserver.

enter image description here

kaxil
  • 17,706
  • 2
  • 59
  • 78
  • It is definitely a problem to get all dependencies in the Scheduler Some libraries conflict, and the correct answer to this is to use specialized images (when using Kubernetes) with the dependencies loaded when required (at runtime, not at design time) Parsing a DAG should be usefull but not blocking, at least when using Airflow on Kubernetes – jao6693 Nov 04 '22 at 12:20
0

The above answer by @kaxil is good but seems to be incomplete. At least according to the documentation here:

Airflow scheduler executes the code outside the Operator’s execute methods

This means that you can avoid running into this sort of ImportError if you change top-level imports for local imports inside Python callables. The referred documentation explains it in more details.

Luiz Tauffer
  • 463
  • 6
  • 17