I am trying to add layer attributes to my catalog. One common pattern I have is to get some data(raw
), clean it up, then output a list of parts(pri
). I then need metadata for those parts in which I take the list of parts from pri
and pass into a function that gets data (raw
). The pipeline itself is not circular, but kedro does not seem to like when I create circular layers.
Is there a common pattern that I am missing for this use case?
Would it be possible to allow layers to be circular?
Example
I have tried to put together a generic example below.
raw_truck_sales:
type: pandas.ParquetDataSet
filepath: <filepath>
layer: raw
int_truck_sales:
type: pandas.ParquetDataSet
filepath: <filepath>
layer: int
pri_truck_sales:
type: pandas.ParquetDataSet
filepath: <filepath>
layer: pri
pri_truck_sold_models:
type: pandas.ParquetDataSet
filepath: <filepath>
layer: pri
raw_truck_metadata:
type: pandas.ParquetDataSet
filepath: <filepath>
layer: raw
int_truck_metadata:
type: pandas.ParquetDataSet
filepath: <filepath>
layer: int
pri_truck_metadata:
type: pandas.ParquetDataSet
filepath: <filepath>
layer: pri
nodes = [
node(
get_truck_sales,
inputs=None,
outputs='raw_truck_sales',
),
node(
create_int_truck_sales,
inputs='raw_truck_sales',
outputs='int_truck_sales',
),
node(
create_pri_truck_sales,
inputs='int_truck_sales',
outputs='pri_truck_sales',
),
node(
lambda truck_sales: truck_sales[['model']],
inputs='pri_truck_sales',
outputs='pri_truck_models_sold',
),
# This node takes the list of trucks sold and gets metadata for them
# It seems to break kedros layers model by creating a circular reference
node(
get_truck_metadata,
inputs='pri_truck_models_sold',
outputs='raw_truck_metadata',
),
node(
create_int_truck_metadata,
inputs='raw_truck_metadata',
outputs='int_truck_metadata',
),
node(
create_pri_truck_metadata,
inputs='int_truck_metadata',
outputs='pri_truck_metadata',
),
]