Questions tagged [kedro]

Kedro is an open source Python library that helps you build production-ready data and analytics pipelines

202 questions
0
votes
1 answer

Versioned Datasets in Kedro

Situation: I have monthly snapshots that should look like this snapshot-2021-10.parquet snapshot-2021-11.parquet snapshot-2021-12.parquet snapshot-2022-01.parquet snapshot-2022-02.parquet In the processing, i need the last n (say: 3) before a given…
user1965813
  • 671
  • 5
  • 16
0
votes
0 answers

Kedro documentation does not show up all functions after compiling kedro build-docs

I tried documenting my project using kedro build-docs command. . ├── docs └── src ├── setup.py ├── tests ├── │ ├── __init__.py │ ├── __main__.py │ ├── __pycache__ │ ├── a │ ├── hooks.py │ ├──…
0
votes
1 answer

How to plot on kedro mlflow ui x1=array/list/dict and y1=array/list/dict?

I am new to kedro, and I don't know if I am asking the right question here. Is it possible on kedro mlflow ui to plot x and y lists? I am running kedro pipeline with mlflow. I have catalog.yaml which I log metrics and artifacts. The end goal…
0
votes
2 answers

how do I run a local script using github actions

Hello I am using kedro (a pipeline tool) and want to use github actions to trigger a kedro command (kedro run) whenever I make a push to my github repo. Since I have all the data in my local repo, I thought it would make sense to run the kedro…
0
votes
1 answer

How to fetch complex MongoDB Data from Kedro?

I'm attempting to get hands on Kedro, but don't understand how to build my Data Fetcher (that I used before). My Data is stored in a MongoDB instance over multiple “Tables”. One table are my usernames. First, I want to fetch them. Thereafter, based…
corusm
  • 43
  • 6
0
votes
0 answers

How to install new package with conda without breaking the existing Kedro installation?

I have a working conda environment with Kedro installed. The .yml file is available by the link 1. My kedro pipelines work fine in this environment. However, when I try to install matplotlib package with conda I have the following warning: The…
Ildar
  • 33
  • 6
0
votes
1 answer

Kedro pipeline on partitioned data

I work on partitioned data (partitioned parquet or SQL table with a "partition" column). I want Kedro to load and save data from a partition I provide at runtime (e.g. kedro run --params partition:A). The number of partitions is large and dynamic. I…
techtech
  • 31
  • 5
0
votes
2 answers

How to save kedro dataset in azure and still have it in memory

I want to save Kedro memory dataset in azure as a file and still want to have it in memory as my pipeline will be using this later in the pipeline. Is this possible in Kedro. I tried to look at Transcoding datasets but looks like not possible. Is…
0
votes
1 answer

Why doesn't my Kedro starter prompt for input?

I would like to create my own Kedro starter. I have tried to replicate the relevant portions of the pandas iris starter. I have a cookiecutter.json file with what I believe are appropriate mappings, and I have changed the repo and package directory…
Tashus
  • 207
  • 2
  • 9
0
votes
2 answers

Kedro run pointing to a previously used Azure Data Lake

I'm trying to read from / write another one of my ADLS Gen2 storage accounts. Until now, it worked perfectly with an old one. I updated the credentials.yml with the new account name and key but it seems like my catalog is always pointing to my old…
Downforu
  • 317
  • 5
  • 13
0
votes
1 answer

how to make a kedro pipeline take configurable input dataframes?

I have created a workflow in kedro made of different data science processing pipelines. These pipelines are tested independently. When i run a particular kedro pipeline in stand alone fashion, the pipeline takes its input from a CSV file. In the…
user2715182
  • 653
  • 2
  • 10
  • 23
0
votes
1 answer

azure datasource throwing error in Kedro datacatalog

I am facing error when configuring azure blob storage dataset in kedro datacatalog. I have the dataset defined in my catalog.yml as below: brand_dataset: type: pandas.CSVDataSet filepath: "abfs://container/my_file.csv" credentials: my_creds …
0
votes
1 answer

Logging the git_sha as a parameter on Mlflow using Kedro hooks

I would like to log the git_sha parameter on Mlflow as shown in the documentation. What appears to me here, is that simply running the following portion of code should be enough to get git_sha logged in the Mlflow UI. Am I right ? @hook_impl def…
Downforu
  • 317
  • 5
  • 13
0
votes
1 answer

Azure Data Lake Storage Gen2 (ADLS Gen2) as a data source for Kedro pipeline

According to Kedro's documentation, Azure Blob Storage is one of the available data sources. Does this extend to ADLS Gen2 ? Haven't tried Kedro yet, but before I invest some time on it, I wanted to make sure I could connect to ADLS Gen2. Thank you…
Downforu
  • 317
  • 5
  • 13
0
votes
1 answer

Adding parameters in Kedro Pipeline

I am trying to write test cases for Kedro pipeline. I have params:lr as input for my model training node. Its not being loaded from the parameters of the training pipeline, nor from parameters.yml. How do I make sure a specific set of parameters are…
N. Bhattarai
  • 11
  • 1
  • 3