-1

Task: Write unit tests importing DAGs and checking their validity in the CI pipeline, similar to https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#unit-tests

Problem: My tests work locally but fail in the pipeline (failing to retrieve a table form a database). I am not sure how I should configure Airflow for unit testing, and if I need a full instance created in the pipeline.

Questions

  1. Do I need to set up the full Airflow instance with a webserver and to initialise the database to check DAG imports?
  2. Does DagBag always require a database?
  3. ...if so, what is the purpose of https://airflow.apache.org/docs/apache-airflow/stable/howto/use-test-config.html?

Test code tests/test_validity.py

@pytest.fixture()
def dagbag(dag_path: str) -> DagBag:
    return DagBag(dag_folder=dag_path)


def test_dagbag_imports(dagbag: DagBag) -> None:
    assert dagbag.import_errors == {}

Error:

FAILED tests/test_validity.py::test_sync_and_patch - sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: dag
[SQL: SELECT dag.dag_id AS dag_dag_id, dag.root_dag_id AS dag_root_dag_id, dag.is_paused AS dag_is_paused, dag.is_subdag AS dag_is_subdag, dag.is_active AS dag_is_active, dag.last_parsed_time AS dag_last_parsed_time, dag.last_pickled AS dag_last_pickled, dag.last_expired AS dag_last_expired, dag.scheduler_lock AS dag_scheduler_lock, dag.pickle_id AS dag_pickle_id, dag.fileloc AS dag_fileloc, dag.processor_subdir AS dag_processor_subdir, dag.owners AS dag_owners, dag.description AS dag_description, dag.default_view AS dag_default_view, dag.schedule_interval AS dag_schedule_interval, dag.timetable_description AS dag_timetable_description, dag.max_active_tasks AS dag_max_active_tasks, dag.max_active_runs AS dag_max_active_runs, dag.has_task_concurrency_limits AS dag_has_task_concurrency_limits, dag.has_import_errors AS dag_has_import_errors, dag.next_dagrun AS dag_next_dagrun, dag.next_dagrun_data_interval_start AS dag_next_dagrun_data_interval_start, dag.next_dagrun_data_interval_end AS dag_next_dagrun_data_interval_end, dag.next_dagrun_create_after AS dag_next_dagrun_create_after 
FROM dag 
WHERE dag.dag_id = ?
 LIMIT ? OFFSET ?]
[parameters: ('sync_and_patch', 1, 0)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)

Pipeline:

unittest:
  stage: test
  before_script:
    - *init_env_before_script
  script:
    - $HOME/.local/bin/poetry  install
    - $HOME/.local/bin/poetry  run pytest -v --tb=line
  variables:
    AUTH0_DOMAIN: $AUTH0_DOMAIN_STAGING
    AUTH0_CLIENT_ID: $AUTH0_CLIENT_ID_STAGING
    AUTH0_API_IDENTIFIER: $AUTH0_API_IDENTIFIER_STAGING
  coverage: /^TOTAL.*\s+(\d+\%)$/
matwasilewski
  • 384
  • 2
  • 11

0 Answers0