Problem:
running airflow db init
in my virtualenv
gives the following error which I can see is installed with a pip freeze
❯ airflow db init
DB: sqlite:////Users/.../airflow.db
[2023-07-24T23:03:08.958+0100] {migration.py:213} INFO - Context impl SQLiteImpl.
[2023-07-24T23:03:08.961+0100] {migration.py:216} INFO - Will assume non-transactional DDL.
[2023-07-24T23:03:09.058+0100] {migration.py:213} INFO - Context impl SQLiteImpl.
[2023-07-24T23:03:09.058+0100] {migration.py:216} INFO - Will assume non-transactional DDL.
[2023-07-24T23:03:09.059+0100] {db.py:1591} INFO - Creating tables
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
ERROR [airflow.models.dagbag.DagBag] Failed to import: /Users/.../dags/test_dag.py
Traceback (most recent call last):
File "/opt/homebrew/lib/python3.11/site-packages/airflow/models/dagbag.py", line 346, in parse
loader.exec_module(new_module)
File "<frozen importlib._bootstrap_external>", line 940, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/Users/.../airflow/dags/test_dag.py", line 4, in <module>
from airflow_package.components.extractors import PyTrendsKeywordExtractor
File "/Users/.../dags/airflow_package/components/extractors.py", line 3, in <module>
from pytrends.request import TrendReq
ModuleNotFoundError: No module named 'pytrends'
WARNI [airflow.models.crypto] empty cryptography key - values will not be stored encrypted.
Initialization done
❯ pip freeze | grep pytrends
pytrends==4.9.2
Context:
I have written some custom dags that use external packages pytrends
and google-search-results
. Using setuptools
, I have registered my custom packages: airflow_package
which contains a class e.g. PyTrendsKeywordExtractor
which uses pytrends
.
#setup.py
from setuptools import setup, find_packages
setup(
name='airflow_package',
version='1.2',
description='A useful module',
author='Man Foo',
author_email='foomail@foo.example',
packages=find_packages(), #same as name
install_requires=["pytrends", "google-search-results", "python-dotenv", "pandas"], #external packages as dependencies
)
#Dag Code (dag_test.py)
with DAG(dag_id="my_test_dag", schedule="@hourly", start_date=datetime(2023, 7, 24), catchup=False) as dag:
@task.python(task_id="keyword_extractor")
def keyword_extractor(**kwargs):
ti = kwargs['ti']
extractor = PyTrendsKeywordExtractor(kw_list=["Nike", "Canada Goose", "Sports Direct"], fuzzy_find_key="company")
keyword_dict = extractor.extract()
ti.xcom_push("target_keywords", json.dumps(keyword_dict))
...
I have the following file structure:
airflow/
├── airflow.cfg
├── airflow.db
├── dags
│ ├── __pycache__
│ │ └── test_dag.cpython-311.pyc
│ ├── airflow_package
│ │ ├── __init__.py
│ │ ├── __pycache__
│ │ │ └── __init__.cpython-311.pyc
│ │ ├── components
│ │ │ ├── __init__.py
│ │ │ ├── __pycache__
│ │ │ │ ├── __init__.cpython-311.pyc
│ │ │ │ └── extractors.cpython-311.pyc
│ │ │ ├── base.py
│ │ │ ├── extractors.py
│ │ │ └── new_extractors.py
│ │ └── settings.py
│ └── test_dag.py
├── logs
│ └── scheduler
│ ├── 2023-07-24
│
└── webserver_config.py
Docs followed:
The airflow documentation is proving pretty hard to follow; I have tried the following:
module structure: https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/modules_management.html#adding-directories-to-the-pythonpath