Questions tagged [kedro]

Kedro is an open source Python library that helps you build production-ready data and analytics pipelines

202 questions
4
votes
1 answer

Where to perform the saving of an nodeoutput in Kedro?

In Kedro, we can pipeline different nodes and partially run some nodes. When we are partially running some nodes, we need to save some inputs from the nodes somewhere so that when another node is run it can access the data that the previous node has…
Baenka
  • 243
  • 3
  • 15
3
votes
1 answer

Kedro - How to update a dataset in a Kedro pipeline given that a dataset cannot be both input and output of a node (only DAG)?

In a Kedro project, I have a dataset in catalog.yml that I need to increment by adding a few lines each time I call my pipeline. #catalog.yml my_main_dataset: type: pandas.SQLTableDataSet credentials: postgrey_credentials save_args: …
3
votes
2 answers

AttributeError: Object ParquetDataSet cannot be loaded from kedro.extras.datasets.pandas

I'm quite new using Kedro and after installing kedro in my conda environment, I'm getting the following error when trying to list my catalog: Command performed: kedro catalog list Error: kedro.io.core.DataSetError: An exception occurred when…
Rubens Rodrigues
  • 165
  • 1
  • 2
  • 10
3
votes
2 answers

Kedro - Memory management

I am working on a Kedro 0.17.2 project that is running on out-of-memory issues and I'm trying to reduce the memory footprint. I'm doing the profiling by using mprof from the memory-profiler library and I noticed that there is always a child process…
lspinheiro
  • 423
  • 1
  • 4
  • 9
3
votes
1 answer

Specify host and port in mlflow.yml and run "kedro mlflow ui", but host and port still default (localhost:5000) not change

I build sample kedro project refer to this page, and specify host as my global ip address in mlflow.yml. but when I hit "kedro mlflow ui" command, it still listen to local. even I only specify port to 5001 (not default) in mlflow.yml, it does not…
RCheng
  • 31
  • 1
  • 2
3
votes
1 answer

How to use tf.data.Dataset with kedro?

I am using tf.data.Dataset to prepare a streaming dataset which is used to train a tf.kears model. With kedro, is there a way to create a node and return the created tf.data.Dataset to use it in the next training node? The MemoryDataset will…
evolved
  • 1,850
  • 19
  • 40
3
votes
1 answer

How can I read/write data from/to network attached storage with kedro?

In the API docs about kedro.io and kedro.contrib.io I could not find info about how to read/write data from/to network attached storage such as e.g. FritzBox NAS.
thinwybk
  • 4,193
  • 2
  • 40
  • 76
3
votes
1 answer

How to convert Spark data frame to Pandas and back in Kedro?

I'm trying to understand what is the optimal way in Kedro to convert Spark dataframe coming out of one node into Pandas required as input for another node without creating a redundant conversion step.
Dmitry Deryabin
  • 1,518
  • 2
  • 14
  • 27
3
votes
1 answer

How to change the process count of the ParallelRunner in Kedro?

My pipeline makes a lot of HTTP requests. It’s not a CPU-heavy operation, I’d like to spin more processes than the number of CPU cores. How can I change this?
921kiyo
  • 584
  • 4
  • 14
2
votes
0 answers

logging in python and kedro, how to log only DEBUG info to a file and INFO to console

I'm trying to configure logging so that INFO level messages go to the console and DEBUG level messages go to a file instead. So far, I am able to get working INFO to console and DEBUG to file, the problem is that the DEBUG is also being output to…
Emilio
  • 33
  • 2
2
votes
1 answer

Is there a way to have files in the Kedro Catalog, that are missing?

I have a kedro pipeline which generates a file that is used again for the next run of that same pipeline. However, when the pipeline runs for the first time, that file does not exist, and it is handled in a node in the pipeline. Kedro throws an…
2
votes
1 answer

S3FS is not recognised in lambda when using torch

I have a Kedro Pipeline Node that on AWS Lambda that accesses s3. It runs if I'm not using torch but fails with Install s3fs to access S3 if I add torch as a dependency. I have a Kedro Pipeline I want to deploy on AWS Step Functions. My requirements…
2
votes
1 answer

Where is my kedro output when using the databricks extension in vscode

I am using kedro together with the databricks extension for vscode to access databricks server on Azure. Everything works pretty well but I don't see any output when executing the file locally. The only output I receive is: 31/03/2023, 15:09:32 -…
YoniPrv
  • 73
  • 5
2
votes
0 answers

Kedro catalog fails when overwriting a GeoJson dataset even though the driver is supported

I have the following catalog item in my kedro project suggested_routes_table@geopandas: type: geopandas.GeoJSONDataSet filepath: data/04_feature/routes_suggestions_table.geojson load_args: driver: "GeoJSON" mode: "a" The keyword argument mode: "a"…
Natalio
  • 31
  • 4
2
votes
2 answers

How to use Kedro with Great-expectations?

I am using Kedro to create a pipeline for ETL purposes and column specific validations are being done using Great-Expectations. There is a hooks.py file listed in Kedro documentation here. This hook is registered as per the instructions mentioned on…
1
2
3
13 14