Questions tagged [kedro]

Kedro is an open source Python library that helps you build production-ready data and analytics pipelines

202 questions
0
votes
3 answers

How to run a kedro pipeline interactively like a function

I would like to run kedro pipelines in jupyter notebook with different inputs, so something like this: data = catalog.load('my_dataset') params = catalog.load('params:my_params') pipelines['my_pipeline'](data=my_dataset, params=my_params) Is there…
ilja
  • 109
  • 7
0
votes
2 answers

DataSetError in Docker Kedro deployment

I try to deploy example Kedro starter project (pandas-iris). I successfuly run it locally (kedro run), and then, having kedro-docker install, init a Docker, build image and push it to my registry. Unfortunately, both kedro docker run and docker run…
Andrzej Wodecki
  • 107
  • 1
  • 8
0
votes
0 answers

How to do SQL like querying the parquet files in kedro

I'm new to kedro, I'm just wondering if I could do SQL like querying the parquet files instead of using Dataframe API's. Please help me out if there is a way. Thanks in advance!
Pyd
  • 6,017
  • 18
  • 52
  • 109
0
votes
2 answers

dynamic parameters on datasets in Kedro

I would like to call an API to enrich an existing dataset. The existing dataset is a CSVDataSet configured in the catalog. Now I would like to create a Node, that enriches the CSVDataSet with data from the API, that I have to call for every row in…
ndueck
  • 713
  • 1
  • 8
  • 27
0
votes
1 answer

ModuleNotFoundError: No module named 'kedro.versioning'

i have upgraded my kedro to latest version.But i have used kedro.versioning in my project.And latest kedro has no module of this name.Can anyone please suggest anything
pc01
  • 11
  • 1
0
votes
1 answer

Kedro template configuration does not load/parse variables

A follow up to this question. I am using Kedro v0.18.2. I am trying use the TemplateConfig so I have created a globals.yml under conf/base, which looks like this: paths: base_path: s3://my_project datasets: pdf: base.PDFDataSet png:…
0
votes
1 answer

Looking for the right way to make a kedro node output lazily two partitioned datasets

I've built a node in Kedro loading lazily an input partitioned dataset, and saving lazily two partitioned datasets as an output (following recommendations found in the Kedro community: using a lambda + callable, into a dict comprehension, processing…
SprigganCG
  • 59
  • 2
  • 2
0
votes
2 answers

Access configuration in the pipelines definition (not only nodes)

I want to dynamically alter pipelines based on the configuration provided. Is there possibility to pass configuration based on the environment to the register_pipelines() or to the create_pipeline() functions? I've read the documentation about…
WestFlame
  • 435
  • 1
  • 7
  • 16
0
votes
1 answer

Rate limiting Kedro API requests

I have a few datasets from the government dataset that I'm using on my ML model, the problem is, their server is not that great to put it nicely. Whenever I run my pipeline, when I pull from their API all at once, their server goes down for a few…
João Areias
  • 1,192
  • 11
  • 41
0
votes
2 answers

Kedro 0.16.3 and kedro[spark.SparkDataSet] pip libraries cannot be installed together on databricks cluster

Till last week both kedro and kedro[spark.SparkDataSet] pip libraries were installed on the cluster. But since last 3-4 days they wont be installed together on the cluster. It shows that its a duplicate library but my code also fails as sparkdataset…
Msant
  • 1
  • 1
0
votes
3 answers

Kedro using wrong conda environment

I have created a conda environment called Foo. After activating this environment I installed Kedro with pip, since conda was giving me a conflict. Even though I'm inside the Foo environment, when I run: kedro jupyter lab It picks up the modules…
João Areias
  • 1,192
  • 11
  • 41
0
votes
2 answers

Getting Kedro Custom Dataset for SunPy Maps to write to/from S3

I'm currently attempting to define a custom dataset to read/write .fits files to/from S3 as SunPy Maps. The closest thing to this already in the data catalog is the pillow.ImageDataSet pillow.ImageDataSet, which supports passing a file object when…
0
votes
1 answer

How to pull data from a paginated JSON API using kedro (APIDataSet)?

The problem: I would like to retrieve data from a paginated API that sends JSON responses. Using kedro.extras.datasets.api.APIDataSet I can query the API and retrieve the initial response. However if there are more results than the size limit per…
0
votes
1 answer

Setup a base dir for the Data Catalog in Kedro

I'm working on a project that, because of the company's compliance rules, the data has to stay in a shared directory, that is synchronized among the programmers. The project's code on the other hand cannot be on that shared directory otherwise we…
João Areias
  • 1,192
  • 11
  • 41
0
votes
1 answer

Mono repo Kedro project

I started a Kedro project a while ago and started to build different parts of the pipeline which only tangentially interact with each other. In some cases not much at all. As a consequence, as the project grew, I am starting to get issues with the…
Jonkie
  • 168
  • 1
  • 1
  • 9