Highest Voted 'kedro' Questions

2

votes

1 answer

Load existing data catalog programmatically

I want to write pytest unit test in Kedro 0.17.5. They need to perform integrity checks on dataframes created by the pipeline. These dataframes are specified in the catalog.yml and already persisted successfully using kedro run. The catalog.yml is…

asked Jun 25 '22 at 07:30

movingabout

343
3
10

2

votes

3 answers

Python Kedro PySpark : py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext

it's my first project using kedro with Pyspark and I have an issue. I work with the new Mac (M1). When I do spark-shell in the terminal, spark is successfully installed and I have the right output (welcome to spark version 3.2.1 with the picture).…

python java apache-spark pyspark kedro

asked Apr 08 '22 at 12:57

Mathilde Roblot

41
1
1
4

2

votes

2 answers

Is there a package in R that mimics KEDRO as a modular collaborative framework for development?

I currently work with Kedro (from quantum black https://kedro.readthedocs.io/en/stable/01_introduction/01_introduction.html) as a framework for deployment oriented framework to code collaboratively. It is a great framework to develop machine…

r frameworks collaboration kedro

asked Dec 15 '21 at 14:58

Felipe Alvarenga

2,572
1
17
36

2

votes

1 answer

Waiting for nodes to finish in Kedro

I have a pipeline in Kedro that looks like this: from kedro.pipeline import Pipeline, node from .nodes import * def foo(): return Pipeline([ node(a, inputs=["train_x", "test_x"], outputs=dict(bar_a="bar_a"), name="A"), node(b,…

python artificial-intelligence kedro mlops

asked Jul 19 '21 at 15:18

João Areias

1,192
11
41

2

votes

1 answer

Kedro install fail to install, but few attempt later it is successful

I have to test if my kedro project works from github so I create a new environment, then : git clone pip install kedro kedro[pandas] kedro-viz jupyter kedro build-reqs kedro install and the install fails, then I retry a few time…

python pipeline kedro

asked Jun 11 '21 at 18:00

Charles Roy

23
4

2

votes

2 answers

Kedro Data Modelling

We are struggling to model our data correctly for use in Kedro - we are using the recommended Raw\Int\Prm\Ft\Mst model but are struggling with some of the concepts....e.g. When is a dataset a feature rather than a primary dataset? The distinction…

kedro

asked Jun 10 '21 at 17:24

SinisterPenguin

1,610
15
17

2

votes

2 answers

Kedro context and catalog missing from Jupyter Notebook

I am able to run my pipelines using the kedro run command without issue. For some reason though I can't access my context and catalog from Jupyter Notebook anymore. When I run kedro jupyter notebook and start a new (or existing) notebook using my…

kedro

asked Feb 02 '21 at 16:54

Pierre Delecto

455
1
7
26

2

votes

2 answers

How do I add a directory of .wav files to the Kedro data catalogue?

This is my first time trying to use the Kedro package. I have a list of .wav files in an s3 bucket, and I'm keen to know how I can have them available within the Kedro data catalog. Any thoughts?

amazon-s3 kedro

asked Jan 26 '21 at 11:22

Myccha

961
1
11
20

2

votes

1 answer

Why my Kedro logging file keeps empty? Am I missing any step?

I am using Kedro but I can't get my logging file to be used. I am following the tutorial. The log file was created but is still empty. Steps done: Configured logging class ProjectContext(KedroContext): def _setup_logging(self) -> None: …

python python-3.x logging kedro

asked Oct 16 '20 at 16:53

Antunes

41
4

2

votes

1 answer

PartitionedDataSet not found when Kedro pipeline is run in Docker

I have multiple text files in an S3 bucket which I read and process. So, I defined PartitionedDataSet in Kedro datacatalog which looks like this: raw_data: type: PartitionedDataSet path: s3://reads/raw dataset: pandas.CSVDataSet load_args: …

docker kedro

asked Sep 22 '20 at 08:56

mendo

86
5

2

votes

1 answer

How to catalog datasets & models by S3 URI, but keep a local copy?

I'm trying to figure out how to store intermediate Kedro pipeline objects both locally AND on S3. In particular, say I have a dataset on S3: my_big_dataset.hdf5: type: kedro.extras.datasets.pandas.HDFDataSet filepath:…

amazon-s3 caching devops kedro

asked Aug 09 '20 at 21:28

crypdick

16,152
7
51
74

2

votes

1 answer

Does kedro support tfrecord?

To train tensorflow keras models on AI Platform using Docker containers, we convert our raw images stored on GCS to a tfrecord dataset using tf.data.Dataset. Thereby the data is never stored locally. Instead the raw images are transformed directly…

google-cloud-platform tfrecord gcp-ai-platform-training kedro tf.data.dataset

asked Jul 30 '20 at 22:41

evolved

1,850
19
40

2

votes

1 answer

Dynamic instance of pipeline execution based on dataset partition/iterator logic

Not sure if this is possible or not, but this is what I am trying to do: - I want to extract out portions (steps) of a function as individual nodes (ok so far), but the catch is I have an iterator on top of steps, which is dependent on some logic on…

python kedro

asked Jun 09 '20 at 13:35

Mohit

1,045
4
18
45

2

votes

1 answer

Does Kedro support Checkpointing/Caching of Results?

Let's say we have multiple long running pipeline nodes. It seems quite straight forward to checkpoint or cache the intermediate results, so when nodes after a checkpoint are changed or added only these nodes must be executed again. Does Kedro…

kedro

asked Jun 05 '20 at 12:48

Sir ExecLP

83
1
5

2

votes

2 answers

Passing nested parameters in the extra_params of the load_context in Kedro

I am trying to load a Kedro context with some extra parameters. My intention is to update the configs in parameters.yml with only the ones passed in extra_params (so rest of the configs should remain same). I will then use this instance of context…

python kedro

asked Jun 05 '20 at 12:14

Mohit

1,045
4
18
45

Questions tagged [kedro]