Kedro is an open source Python library that helps you build production-ready data and analytics pipelines
Questions tagged [kedro]
202 questions
1
vote
2 answers
Kedro: How to pass "list" parameters from command line?
I'd like to control kedro parameters via command line.
According to docs, kedro can specify runtime parameters as follows:
kedro run --params key:value
> {'key': 'value'}
It works. In the same way, I try to specify list parameters like this:
kedro…

chck
- 13
- 3
1
vote
0 answers
How can I import local package dependencies into Kedro notebooks?
I've placed package dependencies (wheels) of a Kedro project into a /deps/*.whl directory. I'm using a venv installed into /.venv and manage it using Poetry.
Packages are referenced in pyproject.toml like this (here e.g.…

thinwybk
- 4,193
- 2
- 40
- 76
1
vote
1 answer
How to load kedro DataSet object dynamically
I am currently using the yaml api to create all of my datasets with kedro==15.5. I would like to be able to peer into this information from time to time dynamically. It appears that I can get to this information with the io.datasets which is a…

Waylon Walker
- 543
- 3
- 10
1
vote
1 answer
How do I select which columns to load in a Kedro CSVLocalDataSet?
I have a csv file that looks like
a,b,c,d
1,2,3,4
5,6,7,8
and I want to load it in as a Kedro CSVLocalDataSet, but I don't want to read the entire file. I only want a few columns (say a and b for example).
Is there any way for me to specify the…

Anton Kirilenko
- 159
- 6
0
votes
1 answer
Define column names when reading a spark dataset in kedro
With kedro, how can I define the column names when reading a spark.SparkDataSet? below my catalog.yaml.
user-playlists:
type: spark.SparkDataSet
file_format: csv
filepath:…

gaut
- 5,771
- 1
- 14
- 45
0
votes
1 answer
in Kedro, how to handle tar.gz archives from the web
I have a tar.gz file that I am downloading from this link: http://ocelma.net/MusicRecommendationDataset/lastfm-1K.html
What is the best way to fully integrate this TSV data into kedro, perhaps with an API dataset first, and then a node to extract…

gaut
- 5,771
- 1
- 14
- 45
0
votes
0 answers
how to create kedro catalog entry + custom DataSet init method with values from credentials.yml
I'm trying to create a custom DataSet class within the kedro framework. I need some help understanding how to combine values from the credentials.yml file.
what is the kedro way of handling the 'mongo_url' property in the catalog entry? how do i…

Emilio
- 33
- 2
0
votes
1 answer
Access Kedro MemoryDataSet when running packaged Kedro pipeline in a script
I want to be able to access a MemoryDataSet results dataframe from a kedro pipeline that I've imported into a script after packaging the pipeline into a python package.
I have a kedro pipeline written and run using Kedro=0.18.9 that collects data,…

trevinator
- 31
- 1
- 5
0
votes
1 answer
kedro: Used azure key and scope, then pull data from snowflake via databricks cluster
I'm currently running into a requirement where we are using a scope/key credential to access/grab data from our Snowflake instance, then load it in our blob storage, while using our Databricks cluster. I was already able to do those things in a…
0
votes
1 answer
Login time out error for pyodbc connection string in kedro
I'm trying to build up connection string in kedro for SQLQueryDataset to set the connection with MSSQL and Pyodbc in Azure Databricks but encountered with an error:
(pyodbc.OperationalError) ('HYT00', '[HYT00] [Microsoft][ODBC Driver 17 for SQL…

saloni
- 1
0
votes
1 answer
Interpolate sql in SQLDataset in catalog.yml
Is there a way to interpolate a SQLDataset query in catalog.yml passing some argument\parameter.
Example:
person:
type: pandas.SQLQueryDataSet
sql: "SELECT * FROM public.people WHERE id = ${id};"
credentials: db_credentials
Thanks in advance!

Nikola
- 620
- 2
- 5
- 18
0
votes
1 answer
Adding multiple data catalog into one in kedro
I am dynamically creating pipeline, nodes and catalog for a project. After creating the them, I want to add pipelines and the catalogs. Adding the pipelines is possible by sum function, but it's not possible for data catalog. Is there any way where…
0
votes
0 answers
Colorful notebook output with rich library
May someone tell me how to set again that colorful output from jupyter notebook without using rich.print? I use VSCode.
I've got this feature with kedro=0.18.4 and lost with kedro=0.18.5. Kedro requires rich as an dependency.
I think it was a…

matt91t
- 103
- 1
- 8
0
votes
1 answer
Parametrize input datasets in kedro
I'm trying to move my project into a kedro pipeline but I'm struggling with the following step:
my prediction pipeline is being run by a scheduler. The scheduler supplies all the necessary parameters (dates, country codes etc.). Up until now I had a…

w_sz
- 332
- 1
- 8
0
votes
0 answers
kedro dynamic catalog creation only for specific nodes before their run
I have several thousands of files to be processed of the different types. I am using dynamic catalog creation with hooks. I used first after_catalog_created hook but it is too early in and I need those entries only for specific nodes. My try is with…

AHR
- 99
- 8