Kedro is an open source Python library that helps you build production-ready data and analytics pipelines
Questions tagged [kedro]
202 questions
0
votes
1 answer
How to dynamically pass save_args to kedro catalog?
I'm trying to write delta tables in Kedro. Changing file format to delta makes the write as delta tables with mode as overwrite.
Previously, a node in the raw layer (meta_reload) creates a dataset that determines what's the start date for…

Sandeep Gunda
- 171
- 1
- 10
0
votes
1 answer
Saving data with DataCatalog
I was looking at iris project example provided by kedro. Apart from logging the accuracy I also wanted to save the predictions and test_y as a csv.
This is the example node provided by kedro.
def report_accuracy(predictions: np.ndarray, test_y:…

BlueMango
- 463
- 7
- 21
0
votes
1 answer
Building an autoencoder with Keras and Kedro
I'm trying to build an autoencoder, which I'm sure I'm doing something wrong. I tried separating the creation of the model from the actual training but this is not really working out for me and is giving me the following error.
AssertionError: Could…

João Areias
- 1,192
- 11
- 41
0
votes
1 answer
How to use SQL Server Bulk Insert in Kedro Node?
I am managing a data pipeline using Kedro and at the last step I have a huge csv file stored in a S3 bucket and I need to load it back to SQL Server.
I'd normally go about that with a bulk insert, but not quite sure how to fit that into the kedro…

filippo
- 5,583
- 13
- 50
- 72
0
votes
1 answer
Kedro can not find SQL Server table
I have these two datasets defined:
flp_test_query:
type: pandas.SQLQueryDataSet
credentials: dw_dev_credentials
sql: select numero from dwdb.dwschema.flp_tst
load_args:
index_col: [numero]
flp_test:
type: pandas.SQLTableDataSet
…

filippo
- 5,583
- 13
- 50
- 72
0
votes
1 answer
How to use Chunk Size for kedro.extras.datasets.pandas.SQLTableDataSet in the kedro pipeline?
I am using kedro.extras.datasets.pandas.SQLTableDataSet and would like to use the chunk_size argument from pandas. However, when running the pipeline, the table gets treated as a generator instead of a pd.dataframe().
How would you use the…

Jacob Weiss
- 41
- 4
0
votes
1 answer
Adding stream_results=True (execution_options) to kedro.extras.datasets.pandas.SQLQueryDataSet
Is it possible to add execution_options to kedro.extras.datasets.pandas.SQLQueryDataSet?
For example, I would like to add stream_results=True to the connection string.
engine = create_engine(
"postgresql://postgres:pass@localhost/example"
)
conn =…

Jacob Weiss
- 11
- 1
0
votes
2 answers
Kedro Conditional Pipes (or alternatives)
I am currently examining different design pattern options for our pipelines. Kedro framework seems like a good option (allowing to modular design pattern, visualization methods, etc.).
The pipelines should be created out of many modules that are…

Jumpman
- 429
- 1
- 3
- 10
0
votes
3 answers
What does this python function signature means in Kedro Tutorial?
I am looking at Kedro Library as my team are looking into using it for our data pipeline.
While going to the offical tutorial - Spaceflight.
I came across this function:
def preprocess_companies(companies: pd.DataFrame) ->…

Kevin Seek
- 5
- 3
0
votes
2 answers
TemplatedConfigLoader in register_config_loader not replacing patterns in catalog.yml (kedro)
I am using kedro to manage some data, for which I have a number of dataset CSVs in the same location. As described here, I should be able to store the filepath to this location in a globals.yml file, and use the ${...} syntax in my catalog, but I…
0
votes
1 answer
SQLAlchemy Oracle - InvalidRequestError: could not retrieve isolation level
I am having problems accessing tables in an Oracle database over a SQLAlchemy connection. Specifically, I am using Kedro catalog.load('table_name') and getting the error message Table table_name not found. So I decided to test my connection using…

Pierre Delecto
- 455
- 1
- 7
- 26
0
votes
1 answer
Parallelism for Entire Kedro Pipeline
I am working on a project where we are processing very large images. The pipeline has several nodes, where each produces output necessary for the next node to run. My understanding is that the ParallelRunner is running the nodes in parallel. It is…
0
votes
1 answer
Is there a way to change hooks dynamically in Kedro?
I know I can add any CLI option via kedro_cli.py.
but I can't find out how to change what hooks are loaded dynamically.
I'm using kedro-mlflow, which features are provided via hooks.
And sometimes I don't want to log MLFlow temporarily.
If it's…

Koichi MIYAMOTO
- 53
- 5
0
votes
1 answer
How to create a list of catalog entries and pass them in as inputs in Kedro Pipeline
I am trying to get a list of datasets from a catalog file i have created and pass them in as inputs of a single node to combine them and ultimately run the pipeline on airflow using the kedro-airflow plugin
This works on the cli with kedro run but…

Metrd
- 79
- 1
- 6
0
votes
1 answer
Kedro - Can't instantiate abstract class ProjectContext with abstract methods project_name, project_version
I'm new to kedro and I have a problem when opening Jupyter Lab/Notebook from Kedro using the command kedro jupyter lab.
The error was:
TypeError: Can't instantiate abstract class ProjectContext with abstract methods project_name, project_version
Run…

Adam Ginza
- 1
- 1