Kedro is an open source Python library that helps you build production-ready data and analytics pipelines
Questions tagged [kedro]
202 questions
0
votes
1 answer
Versioned Datasets in Kedro
Situation:
I have monthly snapshots that should look like this
snapshot-2021-10.parquet
snapshot-2021-11.parquet
snapshot-2021-12.parquet
snapshot-2022-01.parquet
snapshot-2022-02.parquet
In the processing, i need the last n (say: 3) before a given…

user1965813
- 671
- 5
- 16
0
votes
0 answers
Kedro documentation does not show up all functions after compiling kedro build-docs
I tried documenting my project using kedro build-docs command.
.
├── docs
└── src
├── setup.py
├── tests
├──
│ ├── __init__.py
│ ├── __main__.py
│ ├── __pycache__
│ ├── a
│ ├── hooks.py
│ ├──…
0
votes
1 answer
How to plot on kedro mlflow ui x1=array/list/dict and y1=array/list/dict?
I am new to kedro, and I don't know if I am asking the right question here.
Is it possible on kedro mlflow ui to plot x and y lists?
I am running kedro pipeline with mlflow. I have catalog.yaml which I log metrics and artifacts.
The end goal…

user3765968
- 21
- 5
0
votes
2 answers
how do I run a local script using github actions
Hello I am using kedro (a pipeline tool) and want to use github actions to trigger a kedro command (kedro run) whenever I make a push to my github repo.
Since I have all the data in my local repo, I thought it would make sense to run the kedro…

magical_unicorn
- 1
- 1
0
votes
1 answer
How to fetch complex MongoDB Data from Kedro?
I'm attempting to get hands on Kedro, but don't understand how to build my Data Fetcher (that I used before).
My Data is stored in a MongoDB instance over multiple “Tables”. One table are my usernames. First, I want to fetch them.
Thereafter, based…

corusm
- 43
- 6
0
votes
0 answers
How to install new package with conda without breaking the existing Kedro installation?
I have a working conda environment with Kedro installed. The .yml file is available by the link 1. My kedro pipelines work fine in this environment. However, when I try to install matplotlib package with conda I have the following warning:
The…

Ildar
- 33
- 6
0
votes
1 answer
Kedro pipeline on partitioned data
I work on partitioned data (partitioned parquet or SQL table with a "partition" column). I want Kedro to load and save data from a partition I provide at runtime (e.g. kedro run --params partition:A). The number of partitions is large and dynamic.
I…

techtech
- 31
- 5
0
votes
2 answers
How to save kedro dataset in azure and still have it in memory
I want to save Kedro memory dataset in azure as a file and still want to have it in memory as my pipeline will be using this later in the pipeline. Is this possible in Kedro. I tried to look at Transcoding datasets but looks like not possible. Is…

DataEnthusiast
- 39
- 8
0
votes
1 answer
Why doesn't my Kedro starter prompt for input?
I would like to create my own Kedro starter. I have tried to replicate the relevant portions of the pandas iris starter. I have a cookiecutter.json file with what I believe are appropriate mappings, and I have changed the repo and package directory…

Tashus
- 207
- 2
- 9
0
votes
2 answers
Kedro run pointing to a previously used Azure Data Lake
I'm trying to read from / write another one of my ADLS Gen2 storage accounts. Until now, it worked perfectly with an old one.
I updated the credentials.yml with the new account name and key but it seems like my catalog is always pointing to my old…

Downforu
- 317
- 5
- 13
0
votes
1 answer
how to make a kedro pipeline take configurable input dataframes?
I have created a workflow in kedro made of different data science processing pipelines. These pipelines are tested independently.
When i run a particular kedro pipeline in stand alone fashion, the pipeline takes its input from a CSV file.
In the…

user2715182
- 653
- 2
- 10
- 23
0
votes
1 answer
azure datasource throwing error in Kedro datacatalog
I am facing error when configuring azure blob storage dataset in kedro datacatalog.
I have the dataset defined in my catalog.yml as below:
brand_dataset:
type: pandas.CSVDataSet
filepath: "abfs://container/my_file.csv"
credentials: my_creds
…

DataEnthusiast
- 39
- 8
0
votes
1 answer
Logging the git_sha as a parameter on Mlflow using Kedro hooks
I would like to log the git_sha parameter on Mlflow as shown in the documentation. What appears to me here, is that simply running the following portion of code should be enough to get git_sha logged in the Mlflow UI. Am I right ?
@hook_impl
def…

Downforu
- 317
- 5
- 13
0
votes
1 answer
Azure Data Lake Storage Gen2 (ADLS Gen2) as a data source for Kedro pipeline
According to Kedro's documentation, Azure Blob Storage is one of the available data sources. Does this extend to ADLS Gen2 ?
Haven't tried Kedro yet, but before I invest some time on it, I wanted to make sure I could connect to ADLS Gen2.
Thank you…

Downforu
- 317
- 5
- 13
0
votes
1 answer
Adding parameters in Kedro Pipeline
I am trying to write test cases for Kedro pipeline. I have params:lr as input for my model training node. Its not being loaded from the parameters of the training pipeline, nor from parameters.yml.
How do I make sure a specific set of parameters are…

N. Bhattarai
- 11
- 1
- 3