Highest Voted 'kedro' Questions

4

votes

1 answer

Where to perform the saving of an nodeoutput in Kedro?

In Kedro, we can pipeline different nodes and partially run some nodes. When we are partially running some nodes, we need to save some inputs from the nodes somewhere so that when another node is run it can access the data that the previous node has…

python tensorflow kedro

asked Oct 18 '19 at 04:03

Baenka

243
3
15

3

votes

1 answer

Kedro - How to update a dataset in a Kedro pipeline given that a dataset cannot be both input and output of a node (only DAG)?

In a Kedro project, I have a dataset in catalog.yml that I need to increment by adding a few lines each time I call my pipeline. #catalog.yml my_main_dataset: type: pandas.SQLTableDataSet credentials: postgrey_credentials save_args: …

python pandas kedro

asked Nov 07 '22 at 17:42

Jean-Baptiste

31
2

3

votes

2 answers

AttributeError: Object ParquetDataSet cannot be loaded from kedro.extras.datasets.pandas

I'm quite new using Kedro and after installing kedro in my conda environment, I'm getting the following error when trying to list my catalog: Command performed: kedro catalog list Error: kedro.io.core.DataSetError: An exception occurred when…

python pip conda kedro

asked Jan 15 '22 at 05:25

Rubens Rodrigues

165
1
2
10

3

votes

2 answers

Kedro - Memory management

I am working on a Kedro 0.17.2 project that is running on out-of-memory issues and I'm trying to reduce the memory footprint. I'm doing the profiling by using mprof from the memory-profiler library and I noticed that there is always a child process…

python pandas out-of-memory kedro

asked Oct 16 '21 at 01:54

lspinheiro

423
1
4
9

3

votes

1 answer

Specify host and port in mlflow.yml and run "kedro mlflow ui", but host and port still default (localhost:5000) not change

I build sample kedro project refer to this page, and specify host as my global ip address in mlflow.yml. but when I hit "kedro mlflow ui" command, it still listen to local. even I only specify port to 5001 (not default) in mlflow.yml, it does not…

mlflow kedro

asked Apr 02 '21 at 09:20

RCheng

31
1
2

3

votes

1 answer

How to use tf.data.Dataset with kedro?

I am using tf.data.Dataset to prepare a streaming dataset which is used to train a tf.kears model. With kedro, is there a way to create a node and return the created tf.data.Dataset to use it in the next training node? The MemoryDataset will…

tensorflow pickle tensorflow-datasets kedro tf.data.dataset

asked Sep 03 '20 at 18:59

evolved

1,850
19
40

3

votes

1 answer

How can I read/write data from/to network attached storage with kedro?

In the API docs about kedro.io and kedro.contrib.io I could not find info about how to read/write data from/to network attached storage such as e.g. FritzBox NAS.

python kedro

asked May 14 '20 at 07:30

thinwybk

4,193
2
40
76

3

votes

1 answer

How to convert Spark data frame to Pandas and back in Kedro?

I'm trying to understand what is the optimal way in Kedro to convert Spark dataframe coming out of one node into Pandas required as input for another node without creating a redundant conversion step.

python pandas pyspark kedro

asked Nov 11 '19 at 19:33

Dmitry Deryabin

1,518
2
14
27

3

votes

1 answer

How to change the process count of the ParallelRunner in Kedro?

My pipeline makes a lot of HTTP requests. It’s not a CPU-heavy operation, I’d like to spin more processes than the number of CPU cores. How can I change this?

python kedro

asked Nov 11 '19 at 09:46

921kiyo

584
4
14

2

votes

0 answers

logging in python and kedro, how to log only DEBUG info to a file and INFO to console

I'm trying to configure logging so that INFO level messages go to the console and DEBUG level messages go to a file instead. So far, I am able to get working INFO to console and DEBUG to file, the problem is that the DEBUG is also being output to…

python python-logging kedro

asked Jul 26 '23 at 21:35

Emilio

33
2

2

votes

1 answer

Is there a way to have files in the Kedro Catalog, that are missing?

I have a kedro pipeline which generates a file that is used again for the next run of that same pipeline. However, when the pipeline runs for the first time, that file does not exist, and it is handled in a node in the pipeline. Kedro throws an…

python kedro

asked Jun 26 '23 at 14:44

Nandha Kumar

57
5

2

votes

1 answer

S3FS is not recognised in lambda when using torch

I have a Kedro Pipeline Node that on AWS Lambda that accesses s3. It runs if I'm not using torch but fails with Install s3fs to access S3 if I add torch as a dependency. I have a Kedro Pipeline I want to deploy on AWS Step Functions. My requirements…

python aws-lambda pytorch kedro

asked Jun 08 '23 at 06:25

Julius Hetzel

41
4

2

votes

1 answer

Where is my kedro output when using the databricks extension in vscode

I am using kedro together with the databricks extension for vscode to access databricks server on Azure. Everything works pretty well but I don't see any output when executing the file locally. The only output I receive is: 31/03/2023, 15:09:32 -…

azure visual-studio-code databricks kedro

asked Mar 31 '23 at 13:18

YoniPrv

73
5

2

votes

0 answers

Kedro catalog fails when overwriting a GeoJson dataset even though the driver is supported

I have the following catalog item in my kedro project suggested_routes_table@geopandas: type: geopandas.GeoJSONDataSet filepath: data/04_feature/routes_suggestions_table.geojson load_args: driver: "GeoJSON" mode: "a" The keyword argument mode: "a"…

geojson geopandas kedro

asked Jan 26 '23 at 12:41

Natalio

31
4

2

votes

2 answers

How to use Kedro with Great-expectations?

I am using Kedro to create a pipeline for ETL purposes and column specific validations are being done using Great-Expectations. There is a hooks.py file listed in Kedro documentation here. This hook is registered as per the instructions mentioned on…

python airflow kedro great-expectations

asked Dec 20 '22 at 15:28

Dhaval Thakkar

43
10

Questions tagged [kedro]