Kedro is an open source Python library that helps you build production-ready data and analytics pipelines
Questions tagged [kedro]
202 questions
0
votes
3 answers
How to run a kedro pipeline interactively like a function
I would like to run kedro pipelines in jupyter notebook with different inputs, so something like this:
data = catalog.load('my_dataset')
params = catalog.load('params:my_params')
pipelines['my_pipeline'](data=my_dataset, params=my_params)
Is there…

ilja
- 109
- 7
0
votes
2 answers
DataSetError in Docker Kedro deployment
I try to deploy example Kedro starter project (pandas-iris).
I successfuly run it locally (kedro run), and then, having kedro-docker install, init a Docker, build image and push it to my registry.
Unfortunately, both kedro docker run and docker run…

Andrzej Wodecki
- 107
- 1
- 8
0
votes
0 answers
How to do SQL like querying the parquet files in kedro
I'm new to kedro, I'm just wondering if I could do SQL like querying the parquet files instead of using Dataframe API's. Please help me out if there is a way.
Thanks in advance!

Pyd
- 6,017
- 18
- 52
- 109
0
votes
2 answers
dynamic parameters on datasets in Kedro
I would like to call an API to enrich an existing dataset.
The existing dataset is a CSVDataSet configured in the catalog.
Now I would like to create a Node, that enriches the CSVDataSet with data from the API, that I have to call for every row in…

ndueck
- 713
- 1
- 8
- 27
0
votes
1 answer
ModuleNotFoundError: No module named 'kedro.versioning'
i have upgraded my kedro to latest version.But i have used kedro.versioning in my project.And latest kedro has no module of this name.Can anyone please suggest anything

pc01
- 11
- 1
0
votes
1 answer
Kedro template configuration does not load/parse variables
A follow up to this question. I am using Kedro v0.18.2. I am trying use the TemplateConfig so I have created a globals.yml under conf/base, which looks like this:
paths:
base_path: s3://my_project
datasets:
pdf: base.PDFDataSet
png:…

Michael Bahchevanov
- 11
- 1
0
votes
1 answer
Looking for the right way to make a kedro node output lazily two partitioned datasets
I've built a node in Kedro loading lazily an input partitioned dataset, and saving lazily two partitioned datasets as an output (following recommendations found in the Kedro community: using a lambda + callable, into a dict comprehension, processing…

SprigganCG
- 59
- 2
- 2
0
votes
2 answers
Access configuration in the pipelines definition (not only nodes)
I want to dynamically alter pipelines based on the configuration provided. Is there possibility to pass configuration based on the environment to the register_pipelines() or to the create_pipeline() functions?
I've read the documentation about…

WestFlame
- 435
- 1
- 7
- 16
0
votes
1 answer
Rate limiting Kedro API requests
I have a few datasets from the government dataset that I'm using on my ML model, the problem is, their server is not that great to put it nicely. Whenever I run my pipeline, when I pull from their API all at once, their server goes down for a few…

João Areias
- 1,192
- 11
- 41
0
votes
2 answers
Kedro 0.16.3 and kedro[spark.SparkDataSet] pip libraries cannot be installed together on databricks cluster
Till last week both kedro and kedro[spark.SparkDataSet] pip libraries were installed on the cluster. But since last 3-4 days they wont be installed together on the cluster. It shows that its a duplicate library but my code also fails as sparkdataset…

Msant
- 1
- 1
0
votes
3 answers
Kedro using wrong conda environment
I have created a conda environment called Foo. After activating this environment I installed Kedro with pip, since conda was giving me a conflict. Even though I'm inside the Foo environment, when I run:
kedro jupyter lab
It picks up the modules…

João Areias
- 1,192
- 11
- 41
0
votes
2 answers
Getting Kedro Custom Dataset for SunPy Maps to write to/from S3
I'm currently attempting to define a custom dataset to read/write .fits files to/from S3 as SunPy Maps.
The closest thing to this already in the data catalog is the pillow.ImageDataSet pillow.ImageDataSet, which supports passing a file object when…

Jordan Barlow
- 36
- 6
0
votes
1 answer
How to pull data from a paginated JSON API using kedro (APIDataSet)?
The problem: I would like to retrieve data from a paginated API that sends JSON responses.
Using kedro.extras.datasets.api.APIDataSet I can query the API and retrieve the initial response. However if there are more results than the size limit per…

afuetterer
- 1
- 1
0
votes
1 answer
Setup a base dir for the Data Catalog in Kedro
I'm working on a project that, because of the company's compliance rules, the data has to stay in a shared directory, that is synchronized among the programmers. The project's code on the other hand cannot be on that shared directory otherwise we…

João Areias
- 1,192
- 11
- 41
0
votes
1 answer
Mono repo Kedro project
I started a Kedro project a while ago and started to build different parts of the pipeline which only tangentially interact with each other. In some cases not much at all.
As a consequence, as the project grew, I am starting to get issues with the…

Jonkie
- 168
- 1
- 1
- 9