2

I try to replicate some common data & analytics workflows using Delta Live Tables. Currently I am struggling with wrapping my head around on how to achieve below requirements:

  • Have different targets (hive metastore) to write into based on dev or prod
  • Being able to pull from different branches based on dev or prod pipeline

Let's assume I have a single Delta Live Table Pipeline that imports multiple notebooks. The notebooks sit in a repo location. I don't have an option to point to a specific branch. This prevents me from having multiple copies of the pipeline, one having a dev and one having prod as target.

When scheduling a notebook using Jobs and Tasks (rather than a single Delta Live Table Pipeline which can contain multiple notebooks), I can select the branch. Downside is that I am basically manually defining the DAG based on how I glue the tasks together. Not very robust.

Is there a way to achieve the same with Delta Live Table Pipelines? enter image description here

Michael Brenndoerfer
  • 3,483
  • 2
  • 39
  • 50

1 Answers1

2

You simply need two different DLT pipelines defined for your code. Each pipeline will (settings docs):

  • have different target configuration to point to the specific database/schema
  • have different storage configuration to point to specific storage location (S3/ADLS/....)
  • use notebooks from a repository checkout for specific branch. It's easy to create different checkouts for the same Git repository in the Databricks repo. And then you can keep these checkout in sync with Git using some automation, like, shown in this demo.
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • 1
    Hi Alex, thanks for your answer. I didn't realize it was possible to have the same repo, but with different "checkouts pinned". Still a bit clunky tbh (it'd be nicer to be able to point to a branch when adding the notebook path in DLT settings [similar to scheduling a DLT with Jobs]) but this indeed should hopefully work in the interim - thanks! – Michael Brenndoerfer Nov 26 '22 at 18:42
  • 1
    I'm not sure about plans (right now) for adding git support for DLT-based jobs. If you have solution architect or customer success engineer on your account, I would recommend to bring this topic to them, so they can communicate it to the product team. – Alex Ott Nov 26 '22 at 19:00
  • would you point the DLT pipeline to a particular checkout of the repo that has been cloned to Workspace/Repos/$USER/? So a DLT pipeline is tied to a particular users cloned repo? – Oliver Angelil Aug 07 '23 at 19:26
  • No, now I see the explanation on your nutter github repo... the folder trick under Repos...yes a bit clunky. I'll try it. – Oliver Angelil Aug 07 '23 at 19:38
  • You can have different checkouts under the user folder as well. See this demo specific to DLT: https://github.com/alexott/dlt-files-in-repos-demo – Alex Ott Aug 07 '23 at 19:41