Questions tagged [kedro]

Kedro is an open source Python library that helps you build production-ready data and analytics pipelines

202 questions
1
vote
2 answers

Kedro: How to pass "list" parameters from command line?

I'd like to control kedro parameters via command line. According to docs, kedro can specify runtime parameters as follows: kedro run --params key:value > {'key': 'value'} It works. In the same way, I try to specify list parameters like this: kedro…
chck
  • 13
  • 3
1
vote
0 answers

How can I import local package dependencies into Kedro notebooks?

I've placed package dependencies (wheels) of a Kedro project into a /deps/*.whl directory. I'm using a venv installed into /.venv and manage it using Poetry. Packages are referenced in pyproject.toml like this (here e.g.…
thinwybk
  • 4,193
  • 2
  • 40
  • 76
1
vote
1 answer

How to load kedro DataSet object dynamically

I am currently using the yaml api to create all of my datasets with kedro==15.5. I would like to be able to peer into this information from time to time dynamically. It appears that I can get to this information with the io.datasets which is a…
Waylon Walker
  • 543
  • 3
  • 10
1
vote
1 answer

How do I select which columns to load in a Kedro CSVLocalDataSet?

I have a csv file that looks like a,b,c,d 1,2,3,4 5,6,7,8 and I want to load it in as a Kedro CSVLocalDataSet, but I don't want to read the entire file. I only want a few columns (say a and b for example). Is there any way for me to specify the…
0
votes
1 answer

Define column names when reading a spark dataset in kedro

With kedro, how can I define the column names when reading a spark.SparkDataSet? below my catalog.yaml. user-playlists: type: spark.SparkDataSet file_format: csv filepath:…
gaut
  • 5,771
  • 1
  • 14
  • 45
0
votes
1 answer

in Kedro, how to handle tar.gz archives from the web

I have a tar.gz file that I am downloading from this link: http://ocelma.net/MusicRecommendationDataset/lastfm-1K.html What is the best way to fully integrate this TSV data into kedro, perhaps with an API dataset first, and then a node to extract…
gaut
  • 5,771
  • 1
  • 14
  • 45
0
votes
0 answers

how to create kedro catalog entry + custom DataSet init method with values from credentials.yml

I'm trying to create a custom DataSet class within the kedro framework. I need some help understanding how to combine values from the credentials.yml file. what is the kedro way of handling the 'mongo_url' property in the catalog entry? how do i…
Emilio
  • 33
  • 2
0
votes
1 answer

Access Kedro MemoryDataSet when running packaged Kedro pipeline in a script

I want to be able to access a MemoryDataSet results dataframe from a kedro pipeline that I've imported into a script after packaging the pipeline into a python package. I have a kedro pipeline written and run using Kedro=0.18.9 that collects data,…
trevinator
  • 31
  • 1
  • 5
0
votes
1 answer

kedro: Used azure key and scope, then pull data from snowflake via databricks cluster

I'm currently running into a requirement where we are using a scope/key credential to access/grab data from our Snowflake instance, then load it in our blob storage, while using our Databricks cluster. I was already able to do those things in a…
0
votes
1 answer

Login time out error for pyodbc connection string in kedro

I'm trying to build up connection string in kedro for SQLQueryDataset to set the connection with MSSQL and Pyodbc in Azure Databricks but encountered with an error: (pyodbc.OperationalError) ('HYT00', '[HYT00] [Microsoft][ODBC Driver 17 for SQL…
saloni
  • 1
0
votes
1 answer

Interpolate sql in SQLDataset in catalog.yml

Is there a way to interpolate a SQLDataset query in catalog.yml passing some argument\parameter. Example: person: type: pandas.SQLQueryDataSet sql: "SELECT * FROM public.people WHERE id = ${id};" credentials: db_credentials Thanks in advance!
Nikola
  • 620
  • 2
  • 5
  • 18
0
votes
1 answer

Adding multiple data catalog into one in kedro

I am dynamically creating pipeline, nodes and catalog for a project. After creating the them, I want to add pipelines and the catalogs. Adding the pipelines is possible by sum function, but it's not possible for data catalog. Is there any way where…
0
votes
0 answers

Colorful notebook output with rich library

May someone tell me how to set again that colorful output from jupyter notebook without using rich.print? I use VSCode. I've got this feature with kedro=0.18.4 and lost with kedro=0.18.5. Kedro requires rich as an dependency. I think it was a…
matt91t
  • 103
  • 1
  • 8
0
votes
1 answer

Parametrize input datasets in kedro

I'm trying to move my project into a kedro pipeline but I'm struggling with the following step: my prediction pipeline is being run by a scheduler. The scheduler supplies all the necessary parameters (dates, country codes etc.). Up until now I had a…
w_sz
  • 332
  • 1
  • 8
0
votes
0 answers

kedro dynamic catalog creation only for specific nodes before their run

I have several thousands of files to be processed of the different types. I am using dynamic catalog creation with hooks. I used first after_catalog_created hook but it is too early in and I need those entries only for specific nodes. My try is with…
AHR
  • 99
  • 8