Kedro is an open source Python library that helps you build production-ready data and analytics pipelines
Questions tagged [kedro]
202 questions
1
vote
0 answers
Kedro-mlflow usage - when to use it from notebooks, and when from kedro pipeline?
I'm a bit confused - what is the common practice for kedro-mlflow usage? It's seems slightly uncomfortable to use it only from kedro pipelines, but kedro intention is fully reproducible research.
At the same time rather rare tutorials on…

Andrey Bondarenko
- 11
- 3
1
vote
1 answer
Failed while loading data from data set SQLQueryDataSet
I am receiving this error:
DataSetError: Failed while loading data from data set SQLQueryDataSet(load_args={}, sql=select * from table)
when I run (within kedro jupyter…

Jacob Weiss
- 11
- 1
1
vote
2 answers
Kedro : Failed to find the pipeline named '__default__'
Having issues with kedro. The 'register_pipelines' function doesn't seem to be running or creating the default Pipeline that I'm returning from it.
The error is
(kedro-environment) C:\Users\cc667216\OneDrive\DCS_Pipeline\dcs_files>kedro…

Cazforshort
- 85
- 1
- 11
1
vote
2 answers
Parquet file larger than memory consumption of pandas DataFrame
I am storing two different pandas DataFrames as parquet files (through kedro).
Both DataFrames have identical dimensions and dtypes (float32) before getting written to disk. Also, their memory consumption in RAM is…

Nils Blum-Oeste
- 5,608
- 4
- 25
- 26
1
vote
1 answer
kedro run as a python command instead of command line
I am getting started with Kedro, so I created the new kedro project for default iris dataset.
I am able to succesfully run it with kedro run command. My question now is how do I run it as a python command? From the documentation I read that the…

BlueMango
- 463
- 7
- 21
1
vote
1 answer
How do I add xlsb files to the catalog in Kedro?
1.I am using this code in catalog.yml file
equipment_data:
type: pandas.ExcelDataSet
filepath: data\01_raw\Equipment Profile.xlsb
layer: raw
getting error after executing kedro run command.
`
kedro.io.core.DataSetError: Failed while…

Akshay Salvi
- 199
- 2
- 5
1
vote
0 answers
Kedro 0.17 Override global.yml with extra params
Im currently not able to update the globals.yml file with extra params passed at run time as I previously did with Kedro 0.16.x. I run kedro through run.py.
@hook_impl
def register_config_loader(self, conf_paths: Iterable[str]) ->…

Vinay V
- 11
- 1
1
vote
1 answer
Specify Kedro data version within DataCatalog?
Is it possible to define data version with Kedro
type: pandas.CSVDataSet
filepath: data/01_raw/company/cars.csv
versioned: True
load_version: $USER_DEFINED_VERSION # Wanted to do this
Currently, Kedro supports using a CLI to specify load…

mediumnok
- 180
- 1
- 9
1
vote
1 answer
How do I reproduce experiments or specify the nodes execution order in Kedro?
Since kedro determines the execution graph based on the nodes input/outputs, the order of executions is non-deterministic. It can vary between runs.
Even when I set a seed I may sample different data in different runs.
Let says I have 3 nodes that…

mediumnok
- 180
- 1
- 9
1
vote
2 answers
How would one use databricks delta lake format with Kedro?
We are using kedro in our project. Normally, one can define datasets as such:
client_table:
type: spark.SparkDataSet
filepath: ${base_path_spark}/${env}/client_table
file_format: parquet
save_args:
mode: overwrite
Now we're running on…

pascalwhoop
- 2,984
- 3
- 26
- 40
1
vote
1 answer
Adding pandas dependencies after kedro new
I began a new project with kedro new without adding the files from the iris example. The original requirements.txt looked like:
black==v19.10b0
flake8>=3.7.9, <4.0
ipython~=7.0
isort>=4.3.21, <5.0
jupyter~=1.0
jupyter_client>=5.1, <…

Guilherme
- 23
- 3
1
vote
2 answers
kedro: train image classifier with keras ImageDataGenerator
Which kedro dataset should be used when working with images and keras ImageDataGenerator? I know there is ImageDataset but the number of images is too large to fit in memory. And all that keras ImageDataGenerator really needs is a local folder…

evolved
- 1,850
- 19
- 40
1
vote
2 answers
Can Kedro Create Circular Layers
I am trying to add layer attributes to my catalog. One common pattern I have is to get some data(raw), clean it up, then output a list of parts(pri). I then need metadata for those parts in which I take the list of parts from pri and pass into a…

Waylon Walker
- 543
- 3
- 10
1
vote
4 answers
kedro nodes input accept kwargs?
https://kedro.readthedocs.io/en/stable/kedro.pipeline.node.Node.html#kedro.pipeline.node.Node.inputs
I have a function
def function(**kwargs):
return
How can I pass variable to it as a node inputs?
**inputs**
Return node inputs as a list, in the…

mediumnok
- 180
- 1
- 9
1
vote
2 answers
Read from memory for full pipeline, read from files if retry or partial pipeline
How can I use the pipeline to run from memory/file? I think the features are there but I am not sure how I can write the pipeline like this.
My use case is:
normal pipeline, from step 1 to step 10
run from step 2 to step 10
Imagine at step 1, I…

mediumnok
- 180
- 1
- 9