43

Following Airflow tutorial here.

Problem: The webserver returns the following error

Broken DAG: [/usr/local/airflow/dags/test_operator.py] cannot import name 
MyFirstOperator

Notes: The directory structure looks like this:

airflow_home
├── airflow.cfg
├── airflow.db
├── dags
│   └── test_operators.py  
├── plugins
│   └── my_operators.py   
└── unittests.cfg

I am attempting to import the plugin in 'test_operators.py' like this:

from airflow.operators import MyFirstOperator

The code is all the same as what is found in the tutorial.

iehrlich
  • 3,572
  • 4
  • 34
  • 43
Christopher Carlson
  • 963
  • 2
  • 10
  • 20
  • I've never used Airflow before. But have you installed the requirements for the project with pip? – cbll May 11 '17 at 06:28
  • @cbll -- yes, everything was installed according to docs : https://airflow.incubator.apache.org/installation.html – Christopher Carlson May 11 '17 at 06:37
  • as an aside I see a rather similar tutorial here: https://technofob.com/2019/05/30/get-started-developing-workflows-with-apache-airflow/ – Robert Lugg Jun 04 '19 at 22:39
  • btw also look at https://stackoverflow.com/questions/43380679/get-pycharm-to-see-dynamically-generated-python-modules for how to make PyCharm understand the code. – Ustaman Sangat Sep 04 '19 at 18:00
  • The approach outline at [astronomer.io](https://www.astronomer.io/guides/airflow-importing-custom-hooks-operators/) (see answer by @Bjorn), works well. Also, I did **NOT** have to restart any services when new operators were added to the `plugins` folder or new dags were added to `dags` folder. _Note: tested on Amazon Fargate with EFS to sync `dags` and `plugins` across webserver, scheduler and worker containers._ – Arsalan Jumani May 14 '20 at 23:15

12 Answers12

23

After struggling with the Airflow documentation and trying some of the answers here without success, I found this approach from astronomer.io.

As they point out, building an Airflow Plugin can be confusing and perhaps not the best way to add hooks and operators going forward.

Custom hooks and operators are a powerful way to extend Airflow to meet your needs. There is however some confusion on the best way to implement them. According to the Airflow documentation, they can be added using Airflow’s Plugins mechanism. This however, overcomplicates the issue and leads to confusion for many people. Airflow is even considering deprecating using the Plugins mechanism for hooks and operators going forward.

So instead of messing around with the Plugins API I followed Astronomer's approach, setting up Airflow as shown below.

dags
└── my_dag.py               (contains dag and tasks)
plugins
├── __init__.py
├── hooks
│   ├── __init__.py
│   └── mytest_hook.py      (contains class MyTestHook)
└── operators
    ├── __init__.py
    └── mytest_operator.py  (contains class MyTestOperator)

With this approach, all the code for my operator and hook live entirely in their respective files - and there's no confusing plugin file. All the __init__.py files are empty (unlike some equally confusing approaches of putting Plugin code in some of them).

For the imports needed, consider how Airflow actually uses the plugins directory:

When Airflow is running, it will add dags/, plugins/, and config/ to PATH

This means that doing from airflow.operators.mytest_operator import MyTestOperator probably isn't going to work. Instead from operators.mytest_operator import MyTestOperator is the way to go (note the alignment tofrom directory/file.py import Class in my setup above).

Working snippets from my files are shown below.

my_dag.py:

from airflow import DAG
from operators.mytest_operator import MyTestOperator
default_args = {....}
dag = DAG(....)
....
mytask = MyTestOperator(task_id='MyTest Task', dag=dag)
....

my_operator.py:

from airflow.models import BaseOperator
from hooks.mytest_hook import MyTestHook

class MyTestOperator(BaseOperator):
    ....
    hook = MyTestHook(....)
    ....

my_hook.py:

class MyTestHook():
    ....

This worked for me and was much simpler than trying to subclass AirflowPlugin. However it might not work for you if you want changes to the webserver UI:

Note: The Plugins mechanism still must be used for plugins that make changes to the webserver UI.

As an aside, the errors I was getting before this (that are now resolved):

ModuleNotFoundError: No module named 'mytest_plugin.hooks.mytest_hook'
ModuleNotFoundError: No module named 'operators.mytest_plugin'
BjornO
  • 851
  • 8
  • 16
  • 2
    I think this was the best approach. Was a restart required for the scheduler and/or webserver? I didn't see that mentioned in the astronomer.io article? – Gabe Apr 01 '20 at 02:56
  • I confirm that this method works with a server and webUI restart (maybe the webUI is useless, but I did both). – Ragnar Jun 16 '20 at 12:43
  • I can't recall if I restarted the airflow-scheduler service or not. Possibly :) Restarting the airflow webserver service shouldn't be necessary. FYI there are Airflow separate worker processes (which pick up tasks out of a queue produced by the Scheduler), and these could become stale. If in doubt, restart the scheduler (and double check for any stale worker processes between stop and start). This assumes using the Local / Sequential Executor which I was using, not sure for a distributed setup eg using Celery workers. – BjornO Jul 24 '20 at 05:56
  • Great answer. This worked for me. Thanks. – jignatius Sep 03 '21 at 20:35
  • I can confirm this solution worked for me on Google Cloud Composer 2.1.5, Airflow 2.4.3. Thank you! – Lorenzo Feb 15 '23 at 14:50
10

I use airflow 1.10. If it's a custom operator that you want to import, you can upload it to the airflow plugins folder, and then in the DAG specify the import as :

from [filename] import [classname]

where : filename is the name of your plugin file classname is the name of your class.

For example : If the name of your file is my_first_plugin and name of the class is MyFirstOperator then, the import would be :

from my_first_plugin import MyFirstOperator

Worked for me as I am using airflow 1.10

Thanks ! Hope this helps !!

Sneha K
  • 143
  • 1
  • 9
  • 2
    While this works and obviously is simpler, I wonder why Airflow recommends the Plugin machinery, i.e. having a `plugins/__init__.py` with `class MyPlugin(AirflowPlugin): name = 'my_first_plugin' operators = [MyFirstOperator]` The only "advantage" I see is that then you'd import the plugin as `from airflow.operators.my_first_plugin import MyFirstOperator` – Ustaman Sangat Sep 04 '19 at 17:57
9

Airflow version 2 introduced a new mechanism for plugin management as stated in their official documentation:

Changed in version 2.0: Importing operators, sensors, hooks added in plugins via airflow.{operators,sensors, hooks}.<plugin_name> is no longer supported, and these extensions should just be imported as regular python modules. For more information, see: Modules Management and Creating a custom Operator

All you need to manage your python codes, is to put your codes in plugins folder and then start addressing files from this point. suppose you have written TestClass in the test.py file located in the path $AIRFLOW_HOME/plugins/t_plugin/operators/test.py, in dag file you can import it this way:

from t_plugin.operators.test import TestClass
Razzi Abuissa
  • 3,337
  • 2
  • 28
  • 29
smbanaei
  • 1,123
  • 8
  • 14
8

In the article it does like this:

class MyFirstPlugin(AirflowPlugin):
    name = "my_first_plugin"
    operators = [MyFirstOperator]

Instead use:

class MyFirstPlugin(AirflowPlugin):
    name = "my_first_plugin"
    operators = [MyFirstOperator]
    # A list of class(es) derived from BaseHook
    hooks = []
    # A list of class(es) derived from BaseExecutor
    executors = []
    # A list of references to inject into the macros namespace
    macros = []
    # A list of objects created from a class derived
    # from flask_admin.BaseView
    admin_views = []
    # A list of Blueprint object created from flask.Blueprint
    flask_blueprints = []
    # A list of menu links (flask_admin.base.MenuLink)
    menu_links = []

Also don't use:

from airflow.operators import MyFirstOperator

According to the airflow article on plugins, it should be:

from airflow.operators.my_first_plugin import MyFirstOperator

If that doesn't work try:

from airflow.operators.my_operators import MyFirstOperator

If that doesn't work, check your web server log on startup for more information.

jhnclvr
  • 9,137
  • 5
  • 50
  • 55
  • 4
    Thanks, I tried this already - under import, it raises 'no module named 'my_first_plugin', 'my_operators'. – Christopher Carlson May 12 '17 at 03:15
  • Which version of airflow are you using? Can you upgrade to 1.8 if it's 1.7? – jhnclvr May 12 '17 at 14:39
  • 2
    For 1.8 you can find this hint in the [source code](https://github.com/apache/incubator-airflow/blob/32a26d84b679a54add43092d0bdb77350dcbaeaf/airflow/operators/__init__.py#L102): Importing plugin operator ... directly from 'airflow.operators' has been deprecated. Please import from 'airflow.operators.[plugin_module]' instead. Support for direct imports will be dropped entirely in Airflow 2.0. – Christoph Hösler Oct 06 '17 at 17:15
  • You don't need to specify the empty lists for all those other plugin types. The `AirflowPlugin` class that you inherit from already defaults all of them to empty lists https://github.com/apache/incubator-airflow/blob/master/airflow/plugins_manager.py#L36 – Davos Nov 22 '17 at 05:59
  • 2
    The name property of the subclass of AirflowPlugin will become the module name. e.g. if `name = "my_first_plugin"` then in the dag use `from airflow.operators.my_first_plugin import MyFirstOperator` . `my_first_plugin` definitely won't work. As @ChristophHösler mentioned, the old way `from airflow.operators import MyFirstOperator` works, but will be removed as it pollutes the namespace. New way: https://github.com/apache/incubator-airflow/blob/master/airflow/operators/__init__.py#L107 and old way https://github.com/apache/incubator-airflow/blob/master/airflow/operators/__init__.py#L116 – Davos Nov 22 '17 at 07:04
  • I used the proper way to import the plugin but still no luck. Restarted the webserver but no luck again. Then I just rebooted the server running airflow and our local executor. It worked. – Jean-Christophe Rodrigue Mar 27 '18 at 01:07
  • 2
    As of today an using airflow 1.10, the format "from airflow.operators import MyFirstOperator" has worked for me to load a Sensor. – Picarus Sep 17 '18 at 07:35
  • Not working on 1.10.10 even after restarts. It's weird that I can import just fine in the Python console, but it fails on the interface. – Julio Batista Silva May 22 '20 at 18:58
  • I was using `1.10.9` and `from airflow.operators.my_first_plugin import MyFirstOperator` method was working from me and now I upgraded to `2.0.1` and I am getting `ModuleNotFoundError: No module named 'airflow.operators.my_first_plugin'`. Any help regarding that? – saadi Mar 23 '21 at 21:31
6

I restarted the webserver, and now everything works fine.

Here is what I think might have happened:

  1. Before I started with the tutorial example, I tried running my own plugin and dag. There was a minor syntax error on the first run that I fixed, however after the fix I started getting the 'cannot import name' error.
  2. I deleted the plugin and dag, and tried using the one from the tutorial to see what was going on.

My guess is that the error from step 1 somehow affected step 2.

Christopher Carlson
  • 963
  • 2
  • 10
  • 20
  • 11
    In my experience, you need to restart the webserver when you add/modify any plugins. – Daniel Lee Jun 20 '17 at 06:39
  • 1
    @Daniel Lee made a good point here, you need to restart your webserver and scheduler as well, at least this worked for me on Airflow 1.8.2 – dorvak Dec 28 '17 at 09:53
  • this is correct on 1.8.2... need to test on other versions. – root Jun 06 '18 at 20:22
  • @DanielLee what's the best way to restart the server? – howMuchCheeseIsTooMuchCheese Jun 30 '18 at 16:02
  • 1
    Ctrl-c to kill it and then start it again. @howMuchCheeseIsTooMuchCheese – Daniel Lee Jul 02 '18 at 06:17
  • 1
    Just a quick tip: when you add anything to a plug-in, you usually need to restart the web server. When the webserver restarts the very first few lines in stdout (if the webserver is in DEBUG logging mode) will be the plugins import. If there are any issues with your plugin syntax they will show up there. Also important to note, do not put any expensive operations in the init function of your operator, these will be executed every time the scheduler loops. – trejas Jan 20 '19 at 17:05
2

I had to update the plugin path in file airflow.cfg in order to fix the problem.

Where your Airflow plugins are stored:

plugins_folder = /airflow/plugins
Roy Scheffers
  • 3,832
  • 11
  • 31
  • 36
1

I encountered the same error while following these tutorials.

My fault, however, was that I had used space character ' ' in task_id, which isn't supported by Airflow.

Clearly the error didn't point towards the actual problem. Restarting both Airflow scheduler and webserver then showed the correct error message on WebUI.

y2k-shubham
  • 10,183
  • 11
  • 55
  • 131
  • From [source-code](https://github.com/apache/incubator-airflow/blob/master/airflow/utils/helpers.py#L51), it is clear that `dag_id`s & `task_id`s can only contain underscores, dashes and dots (`_`, `-`, `.`) apart from *alphanumeric* characters – y2k-shubham Aug 01 '18 at 11:27
1

As per the docs -

The python modules in the plugins folder get imported, and hooks, operators, sensors, macros, executors and web views get integrated to Airflow’s main collections and become available for use.

and works fine in version 1.10.1

Sachin Kolige
  • 363
  • 2
  • 7
0

In my case I managed to make a custom operator with the following steps:

  1. Airflow 10.3
  2. in DAG File from airflow.operators import MacrosPostgresOperator
  3. In ~/airflow/plugins folder I have a python file custom_operator.py and the code is pretty simple
from airflow.plugins_manager import AirflowPlugin
from airflow.operators.postgres_operator import PostgresOperator

 class MacrosPostgresOperator(PostgresOperator):
    template_fields = ('sql', 'parameters')

class MacrosFirstPlugin(AirflowPlugin):
    name = "macros_first_plugin"
    operators = [MacrosPostgresOperator]
alexopoulos7
  • 794
  • 7
  • 27
0

You must stop (CTRL-C) and restart your Airflow web server and scheduler.

RajashekharC
  • 164
  • 1
  • 8
0

Let's say, following is the custom plugin that you have implemented in my_operators.py,

class MyFirstPlugin(AirflowPlugin):
    name = "my_first_plugin"
    operators = [MyFirstOperator]

Then as per the Airflow documentation, you have to import in the following structure,

from airflow.{type, like "operators", "sensors"}.{name specified inside the plugin class} import *

So, you should import like the following in your case,

from airflow.operators.my_first_plugin import MyFirstOperator
Manoj Kumar S
  • 634
  • 8
  • 16
-1

I faced the same issue following the same tutorial. What worked for me was to replace the import of MyFirstOperator with:

from airflow_home.plugins.my_operators import MyFirstOperator
Dharman
  • 30,962
  • 25
  • 85
  • 135
Red1
  • 1
  • 1