0

Problem:

running airflow db init in my virtualenv gives the following error which I can see is installed with a pip freeze

❯ airflow db init
DB: sqlite:////Users/.../airflow.db
[2023-07-24T23:03:08.958+0100] {migration.py:213} INFO - Context impl SQLiteImpl.
[2023-07-24T23:03:08.961+0100] {migration.py:216} INFO - Will assume non-transactional DDL.
[2023-07-24T23:03:09.058+0100] {migration.py:213} INFO - Context impl SQLiteImpl.
[2023-07-24T23:03:09.058+0100] {migration.py:216} INFO - Will assume non-transactional DDL.
[2023-07-24T23:03:09.059+0100] {db.py:1591} INFO - Creating tables
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
ERROR [airflow.models.dagbag.DagBag] Failed to import: /Users/.../dags/test_dag.py
Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/airflow/models/dagbag.py", line 346, in parse
    loader.exec_module(new_module)
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/Users/.../airflow/dags/test_dag.py", line 4, in <module>
    from airflow_package.components.extractors import PyTrendsKeywordExtractor
  File "/Users/.../dags/airflow_package/components/extractors.py", line 3, in <module>
    from pytrends.request import TrendReq
ModuleNotFoundError: No module named 'pytrends'
WARNI [airflow.models.crypto] empty cryptography key - values will not be stored encrypted.
Initialization done
❯ pip freeze | grep pytrends
pytrends==4.9.2

Context:

I have written some custom dags that use external packages pytrends and google-search-results. Using setuptools, I have registered my custom packages: airflow_package which contains a class e.g. PyTrendsKeywordExtractor which uses pytrends.

#setup.py
from setuptools import setup, find_packages

setup(
   name='airflow_package',
   version='1.2',
   description='A useful module',
   author='Man Foo',
   author_email='foomail@foo.example',
   packages=find_packages(),  #same as name
   install_requires=["pytrends", "google-search-results", "python-dotenv", "pandas"], #external packages as dependencies
)
#Dag Code (dag_test.py)

with DAG(dag_id="my_test_dag", schedule="@hourly", start_date=datetime(2023, 7, 24), catchup=False) as dag:


    @task.python(task_id="keyword_extractor")
    def keyword_extractor(**kwargs):
        ti = kwargs['ti']
        extractor = PyTrendsKeywordExtractor(kw_list=["Nike", "Canada Goose", "Sports Direct"], fuzzy_find_key="company")
        keyword_dict = extractor.extract()
        ti.xcom_push("target_keywords", json.dumps(keyword_dict))
    ...

I have the following file structure:

airflow/
├── airflow.cfg
├── airflow.db
├── dags
│   ├── __pycache__
│   │   └── test_dag.cpython-311.pyc
│   ├── airflow_package
│   │   ├── __init__.py
│   │   ├── __pycache__
│   │   │   └── __init__.cpython-311.pyc
│   │   ├── components
│   │   │   ├── __init__.py
│   │   │   ├── __pycache__
│   │   │   │   ├── __init__.cpython-311.pyc
│   │   │   │   └── extractors.cpython-311.pyc
│   │   │   ├── base.py
│   │   │   ├── extractors.py
│   │   │   └── new_extractors.py
│   │   └── settings.py
│   └── test_dag.py
├── logs
│   └── scheduler
│       ├── 2023-07-24
│ 
└── webserver_config.py

Docs followed:

The airflow documentation is proving pretty hard to follow; I have tried the following:

module structure: https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/modules_management.html#adding-directories-to-the-pythonpath

0 Answers0