0

In jupyter notebook, when I run session.run(pipeline_name='sim', from_inputs=['measurements', 'params:simulation']), passing datasets & params specified in catalog.yaml, everything works fine. However, when I want to run it with a dataset that I added during the session, a ValueError occurs:

>>> ds = GenMsmtsDataSet()
>>> catalog.add('ipy_msmts', ds)
>>> session.run(pipeline_name='sim', from_inputs=['ipy_msmts', 'params:simulation'])
ValueError: Pipeline does not contain data_sets named ['ipy_msmts']

However, catalog.list() contains the newly added ipy_msmts dataset. Also, catalog.load('ipy_msmts') works perfectly fine.

Why can't the pipeline access my custom dataset I manually added to the catalog?

ilja
  • 109
  • 7

1 Answers1

0

The problem with your setup above, is that although you have added the dataset to the catalog, it's not used anywhere in your pipeline. That's why you're seeing the error "Pipeline does not contain data_sets named ['ipy_msmts']"

In the interactive jupyter and ipython workflow you can add datasets with catalog.add() and experiment with that data in the notebook, but it won't be added to your catalog fully and when you exit the interactive session, the data will be gone. It's not recommended to do runs in a notebook, but rather in the CLI with kedro run.

We're constantly improving the ipython/jupyter workflow and specifically also how you can debug Kedro pipelines. If you have thoughts on that let us know in this github issue: https://github.com/kedro-org/kedro/issues/1832 or feel free to create a new issue with any ideas you have.

mtheisen
  • 61
  • 2