I can think of 2 ways, depending on your use case.
a) You could use separate environments for this. When running the full pipeline, you use some environment regular
where you don't have a catalog entry for dataset in question (hence will be turned into MemoryDataSet
), while in a separate dev
environment you have an entry in your catalog.yml
to save it as a CSV. But it does mean you'd have to run dev
from node 1 in order to generate the csv to be used for subsequent runs.
kedro run --env regular
kedro run --env dev
kedro run -e dev --from-nodes node2
Relevant docs: https://kedro.readthedocs.io/en/stable/04_kedro_project_setup/02_configuration.html#additional-configuration-environments
b) Another way to do it, if you always want the first node to write to csv, is to have node1 return 2 outputs (same data), one as pandas.CSVDataSet
and one as MemoryDataSet
, and you define different pipelines. Pipeline complete
where second node reads from memory, and partial
where you don't have node1, and node2 reads from the csv dataset.
kedro run --pipeline complete
kedro run --pipeline partial
https://kedro.readthedocs.io/en/stable/06_nodes_and_pipelines/02_pipelines.html#running-a-pipeline-by-name