Task: Write unit tests importing DAGs and checking their validity in the CI pipeline, similar to https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#unit-tests
Problem: My tests work locally but fail in the pipeline (failing to retrieve a table form a database). I am not sure how I should configure Airflow for unit testing, and if I need a full instance created in the pipeline.
Questions
- Do I need to set up the full Airflow instance with a webserver and to initialise the database to check DAG imports?
- Does DagBag always require a database?
- ...if so, what is the purpose of https://airflow.apache.org/docs/apache-airflow/stable/howto/use-test-config.html?
Test code tests/test_validity.py
@pytest.fixture()
def dagbag(dag_path: str) -> DagBag:
return DagBag(dag_folder=dag_path)
def test_dagbag_imports(dagbag: DagBag) -> None:
assert dagbag.import_errors == {}
Error:
FAILED tests/test_validity.py::test_sync_and_patch - sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: dag
[SQL: SELECT dag.dag_id AS dag_dag_id, dag.root_dag_id AS dag_root_dag_id, dag.is_paused AS dag_is_paused, dag.is_subdag AS dag_is_subdag, dag.is_active AS dag_is_active, dag.last_parsed_time AS dag_last_parsed_time, dag.last_pickled AS dag_last_pickled, dag.last_expired AS dag_last_expired, dag.scheduler_lock AS dag_scheduler_lock, dag.pickle_id AS dag_pickle_id, dag.fileloc AS dag_fileloc, dag.processor_subdir AS dag_processor_subdir, dag.owners AS dag_owners, dag.description AS dag_description, dag.default_view AS dag_default_view, dag.schedule_interval AS dag_schedule_interval, dag.timetable_description AS dag_timetable_description, dag.max_active_tasks AS dag_max_active_tasks, dag.max_active_runs AS dag_max_active_runs, dag.has_task_concurrency_limits AS dag_has_task_concurrency_limits, dag.has_import_errors AS dag_has_import_errors, dag.next_dagrun AS dag_next_dagrun, dag.next_dagrun_data_interval_start AS dag_next_dagrun_data_interval_start, dag.next_dagrun_data_interval_end AS dag_next_dagrun_data_interval_end, dag.next_dagrun_create_after AS dag_next_dagrun_create_after
FROM dag
WHERE dag.dag_id = ?
LIMIT ? OFFSET ?]
[parameters: ('sync_and_patch', 1, 0)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Pipeline:
unittest:
stage: test
before_script:
- *init_env_before_script
script:
- $HOME/.local/bin/poetry install
- $HOME/.local/bin/poetry run pytest -v --tb=line
variables:
AUTH0_DOMAIN: $AUTH0_DOMAIN_STAGING
AUTH0_CLIENT_ID: $AUTH0_CLIENT_ID_STAGING
AUTH0_API_IDENTIFIER: $AUTH0_API_IDENTIFIER_STAGING
coverage: /^TOTAL.*\s+(\d+\%)$/