Questions tagged [kedro]

Kedro is an open source Python library that helps you build production-ready data and analytics pipelines

202 questions
0
votes
1 answer

How to dynamically pass save_args to kedro catalog?

I'm trying to write delta tables in Kedro. Changing file format to delta makes the write as delta tables with mode as overwrite. Previously, a node in the raw layer (meta_reload) creates a dataset that determines what's the start date for…
Sandeep Gunda
  • 171
  • 1
  • 10
0
votes
1 answer

Saving data with DataCatalog

I was looking at iris project example provided by kedro. Apart from logging the accuracy I also wanted to save the predictions and test_y as a csv. This is the example node provided by kedro. def report_accuracy(predictions: np.ndarray, test_y:…
BlueMango
  • 463
  • 7
  • 21
0
votes
1 answer

Building an autoencoder with Keras and Kedro

I'm trying to build an autoencoder, which I'm sure I'm doing something wrong. I tried separating the creation of the model from the actual training but this is not really working out for me and is giving me the following error. AssertionError: Could…
João Areias
  • 1,192
  • 11
  • 41
0
votes
1 answer

How to use SQL Server Bulk Insert in Kedro Node?

I am managing a data pipeline using Kedro and at the last step I have a huge csv file stored in a S3 bucket and I need to load it back to SQL Server. I'd normally go about that with a bulk insert, but not quite sure how to fit that into the kedro…
filippo
  • 5,583
  • 13
  • 50
  • 72
0
votes
1 answer

Kedro can not find SQL Server table

I have these two datasets defined: flp_test_query: type: pandas.SQLQueryDataSet credentials: dw_dev_credentials sql: select numero from dwdb.dwschema.flp_tst load_args: index_col: [numero] flp_test: type: pandas.SQLTableDataSet …
filippo
  • 5,583
  • 13
  • 50
  • 72
0
votes
1 answer

How to use Chunk Size for kedro.extras.datasets.pandas.SQLTableDataSet in the kedro pipeline?

I am using kedro.extras.datasets.pandas.SQLTableDataSet and would like to use the chunk_size argument from pandas. However, when running the pipeline, the table gets treated as a generator instead of a pd.dataframe(). How would you use the…
0
votes
1 answer

Adding stream_results=True (execution_options) to kedro.extras.datasets.pandas.SQLQueryDataSet

Is it possible to add execution_options to kedro.extras.datasets.pandas.SQLQueryDataSet? For example, I would like to add stream_results=True to the connection string. engine = create_engine( "postgresql://postgres:pass@localhost/example" ) conn =…
0
votes
2 answers

Kedro Conditional Pipes (or alternatives)

I am currently examining different design pattern options for our pipelines. Kedro framework seems like a good option (allowing to modular design pattern, visualization methods, etc.). The pipelines should be created out of many modules that are…
Jumpman
  • 429
  • 1
  • 3
  • 10
0
votes
3 answers

What does this python function signature means in Kedro Tutorial?

I am looking at Kedro Library as my team are looking into using it for our data pipeline. While going to the offical tutorial - Spaceflight. I came across this function: def preprocess_companies(companies: pd.DataFrame) ->…
0
votes
2 answers

TemplatedConfigLoader in register_config_loader not replacing patterns in catalog.yml (kedro)

I am using kedro to manage some data, for which I have a number of dataset CSVs in the same location. As described here, I should be able to store the filepath to this location in a globals.yml file, and use the ${...} syntax in my catalog, but I…
0
votes
1 answer

SQLAlchemy Oracle - InvalidRequestError: could not retrieve isolation level

I am having problems accessing tables in an Oracle database over a SQLAlchemy connection. Specifically, I am using Kedro catalog.load('table_name') and getting the error message Table table_name not found. So I decided to test my connection using…
Pierre Delecto
  • 455
  • 1
  • 7
  • 26
0
votes
1 answer

Parallelism for Entire Kedro Pipeline

I am working on a project where we are processing very large images. The pipeline has several nodes, where each produces output necessary for the next node to run. My understanding is that the ParallelRunner is running the nodes in parallel. It is…
0
votes
1 answer

Is there a way to change hooks dynamically in Kedro?

I know I can add any CLI option via kedro_cli.py. but I can't find out how to change what hooks are loaded dynamically. I'm using kedro-mlflow, which features are provided via hooks. And sometimes I don't want to log MLFlow temporarily. If it's…
0
votes
1 answer

How to create a list of catalog entries and pass them in as inputs in Kedro Pipeline

I am trying to get a list of datasets from a catalog file i have created and pass them in as inputs of a single node to combine them and ultimately run the pipeline on airflow using the kedro-airflow plugin This works on the cli with kedro run but…
Metrd
  • 79
  • 1
  • 6
0
votes
1 answer

Kedro - Can't instantiate abstract class ProjectContext with abstract methods project_name, project_version

I'm new to kedro and I have a problem when opening Jupyter Lab/Notebook from Kedro using the command kedro jupyter lab. The error was: TypeError: Can't instantiate abstract class ProjectContext with abstract methods project_name, project_version Run…