Questions tagged [kedro]

Kedro is an open source Python library that helps you build production-ready data and analytics pipelines

202 questions
1
vote
0 answers

Kedro-mlflow usage - when to use it from notebooks, and when from kedro pipeline?

I'm a bit confused - what is the common practice for kedro-mlflow usage? It's seems slightly uncomfortable to use it only from kedro pipelines, but kedro intention is fully reproducible research. At the same time rather rare tutorials on…
1
vote
1 answer

Failed while loading data from data set SQLQueryDataSet

I am receiving this error: DataSetError: Failed while loading data from data set SQLQueryDataSet(load_args={}, sql=select * from table) when I run (within kedro jupyter…
1
vote
2 answers

Kedro : Failed to find the pipeline named '__default__'

Having issues with kedro. The 'register_pipelines' function doesn't seem to be running or creating the default Pipeline that I'm returning from it. The error is (kedro-environment) C:\Users\cc667216\OneDrive\DCS_Pipeline\dcs_files>kedro…
Cazforshort
  • 85
  • 1
  • 11
1
vote
2 answers

Parquet file larger than memory consumption of pandas DataFrame

I am storing two different pandas DataFrames as parquet files (through kedro). Both DataFrames have identical dimensions and dtypes (float32) before getting written to disk. Also, their memory consumption in RAM is…
Nils Blum-Oeste
  • 5,608
  • 4
  • 25
  • 26
1
vote
1 answer

kedro run as a python command instead of command line

I am getting started with Kedro, so I created the new kedro project for default iris dataset. I am able to succesfully run it with kedro run command. My question now is how do I run it as a python command? From the documentation I read that the…
BlueMango
  • 463
  • 7
  • 21
1
vote
1 answer

How do I add xlsb files to the catalog in Kedro?

1.I am using this code in catalog.yml file equipment_data: type: pandas.ExcelDataSet filepath: data\01_raw\Equipment Profile.xlsb layer: raw getting error after executing kedro run command. ` kedro.io.core.DataSetError: Failed while…
Akshay Salvi
  • 199
  • 2
  • 5
1
vote
0 answers

Kedro 0.17 Override global.yml with extra params

Im currently not able to update the globals.yml file with extra params passed at run time as I previously did with Kedro 0.16.x. I run kedro through run.py. @hook_impl def register_config_loader(self, conf_paths: Iterable[str]) ->…
Vinay V
  • 11
  • 1
1
vote
1 answer

Specify Kedro data version within DataCatalog?

Is it possible to define data version with Kedro type: pandas.CSVDataSet filepath: data/01_raw/company/cars.csv versioned: True load_version: $USER_DEFINED_VERSION # Wanted to do this Currently, Kedro supports using a CLI to specify load…
mediumnok
  • 180
  • 1
  • 9
1
vote
1 answer

How do I reproduce experiments or specify the nodes execution order in Kedro?

Since kedro determines the execution graph based on the nodes input/outputs, the order of executions is non-deterministic. It can vary between runs. Even when I set a seed I may sample different data in different runs. Let says I have 3 nodes that…
mediumnok
  • 180
  • 1
  • 9
1
vote
2 answers

How would one use databricks delta lake format with Kedro?

We are using kedro in our project. Normally, one can define datasets as such: client_table: type: spark.SparkDataSet filepath: ${base_path_spark}/${env}/client_table file_format: parquet save_args: mode: overwrite Now we're running on…
pascalwhoop
  • 2,984
  • 3
  • 26
  • 40
1
vote
1 answer

Adding pandas dependencies after kedro new

I began a new project with kedro new without adding the files from the iris example. The original requirements.txt looked like: black==v19.10b0 flake8>=3.7.9, <4.0 ipython~=7.0 isort>=4.3.21, <5.0 jupyter~=1.0 jupyter_client>=5.1, <…
Guilherme
  • 23
  • 3
1
vote
2 answers

kedro: train image classifier with keras ImageDataGenerator

Which kedro dataset should be used when working with images and keras ImageDataGenerator? I know there is ImageDataset but the number of images is too large to fit in memory. And all that keras ImageDataGenerator really needs is a local folder…
evolved
  • 1,850
  • 19
  • 40
1
vote
2 answers

Can Kedro Create Circular Layers

I am trying to add layer attributes to my catalog. One common pattern I have is to get some data(raw), clean it up, then output a list of parts(pri). I then need metadata for those parts in which I take the list of parts from pri and pass into a…
Waylon Walker
  • 543
  • 3
  • 10
1
vote
4 answers

kedro nodes input accept kwargs?

https://kedro.readthedocs.io/en/stable/kedro.pipeline.node.Node.html#kedro.pipeline.node.Node.inputs I have a function def function(**kwargs): return How can I pass variable to it as a node inputs? **inputs** Return node inputs as a list, in the…
mediumnok
  • 180
  • 1
  • 9
1
vote
2 answers

Read from memory for full pipeline, read from files if retry or partial pipeline

How can I use the pipeline to run from memory/file? I think the features are there but I am not sure how I can write the pipeline like this. My use case is: normal pipeline, from step 1 to step 10 run from step 2 to step 10 Imagine at step 1, I…
mediumnok
  • 180
  • 1
  • 9