1

Is it possible to define data version with Kedro

  type: pandas.CSVDataSet
  filepath: data/01_raw/company/cars.csv
  versioned: True
  load_version: $USER_DEFINED_VERSION # Wanted to do this

Currently, Kedro supports using a CLI to specify load version, it would be easier to specifying in Datacatalog instead.

kedro run --load-version="cars.csv:YYYY-MM-DDThh.mm.ss.sssZ"
mediumnok
  • 180
  • 1
  • 9

1 Answers1

3

Load versions fall under the category of runtime configuration. It was a deliberate decision to not include load_version as another key, out of a wish to separate runtime configuration from the data catalog. If you wanted to specify multiple load versions and it's cumbersome to do so from the CLI, you can take advantage of the ability to execute kedro run -c config.yml and specify your runtime configuration/params in config.yml.

  • I found it easier to manage it right next to the definition with one configuration file, after all there are already quite a few config files inside Kedro. I would have to do so by turning version: false and specific the path to the specific timestamp folder without this key. On the other hand, kedro allows overwriting a defined parameter in cli – mediumnok Nov 17 '20 at 14:47
  • I found datacatalog do accept a version argument, but it has no effect during run-time. – mediumnok Nov 17 '20 at 18:03
  • "I found datacatalog do accept a version argument, but it has no effect during run-time." Yes that's intended for usage in an interactive session, like ipython or jupyter, for exploration, rather than for an actual run. It's not an ideal solution, because we now have the same class (DataCatalog) that is trying to solve for two divergent workflows, but that's a story for another time. – Lorena Balan Nov 18 '20 at 15:29