2
  1. I want to write pytest unit test in Kedro 0.17.5. They need to perform integrity checks on dataframes created by the pipeline. These dataframes are specified in the catalog.yml and already persisted successfully using kedro run. The catalog.yml is in conf/base.

  2. I have a test module test_my_dataframe.py in src/tests/pipelines/my_pipeline/.

How can I load the data catalog based on my catalog.yml programmatically from within test_my_dataframe.py in order to properly access my specified dataframes?

Or, for that matter, how can I programmatically load the whole project context (including the data catalog) in order to also execute nodes etc.?

pppery
  • 3,731
  • 22
  • 33
  • 46
movingabout
  • 343
  • 3
  • 10

1 Answers1

2
  1. For unit testing, we test just the function which we are testing, and everything external to the function we should mock/patch. Check if you really need kedro project context while writing the unit test.

  2. If you really need project context in test, you can do something like following

from kedro.framework.project import configure_project
from kedro.framework.session import KedroSession

with KedroSession.create(package_name="demo", project_path=Path.cwd()) as session:
    context = session.load_context()
    catalog = context.catalog

or you can also create pytest fixture to use it again and again with scope of your choice.

@pytest.fixture
def get_project_context():
    session = KedroSession.create(
        package_name="demo",
        project_path=Path.cwd()
    )
    _activate_session(session, force=True)
    context = session.load_context()
    return context

Different args supported by KedroSession create you can check it here https://kedro.readthedocs.io/en/0.17.5/kedro.framework.session.session.KedroSession.html#kedro.framework.session.session.KedroSession.create

To read more about pytest fixture you can refer to https://docs.pytest.org/en/6.2.x/fixture.html#scope-sharing-fixtures-across-classes-modules-packages-or-session

Rahul Kumar
  • 2,184
  • 3
  • 24
  • 46
  • Thanks! Regarding your point 1: I was looking into https://greatexpectations.io for data tests. But for my purposes I figured that pytest might be more lightweight. Is https://pypi.org/project/kedro-great/ the recommended way to integrate kedro with greatexpectations currently? The package was last updated in 2020. – movingabout Jun 27 '22 at 07:05
  • There currently isn't a well maintained plug-in available. You can build your own integration with Hooks, but I also am increasingly impressed by pandera which is super easy to integrate with Kedro. – datajoely Jun 27 '22 at 09:12