0

Currently, we are using Databricks Repo on Databricks with Azure Devops as .git.

Azure Devops is configure to deploy new update on each branch [dev,main,release] using

databricks repos update --path /Repos/dev/my_repo --branch dev
databricks repos update --path /Repos/staging/my_repo --branch main
databricks repos update --path /Repos/prod/my_repo --branch release

It's working well.

We are having issues to define some proper process to unit_test our notebook, specially 'staging notebooks from Azure DevOps pull request', before accepts it.

Since we are using only notebook, we have to push our "staging pull request code' in Databricks, then run unit_tests.

This is the CI workflow I would love to implement:

  • Create a repo for "unit_test" in /Repos/test/my_repo

  • User create a feature_branch and create pull request to dev.

  • During Azure Devops CI, after resolving conflicts, update '/Repos/test/my_repo' with fresh merged code in AzureDevops Pull request using

databricks repos update --path /Repos/test/my_repo --branch 'merged branch from pull request'
  • then, trigger Databricks Worflows associated to that Repo /Repos/test/my_repo which run unit tests.
  • If unit-tests are OK from Databricks Workflows execution, then accept Pull Request on AzureDevops, and do an update of dev branch.

I did look at AlexOtt example (https://github.com/alexott/databricks-nutter-repos-demo/blob/master/azure-pipelines.yml) which deploy 'sourceBranch' to test, and not 'sourceBranch' + 'targetBranch'

Is there a way, in AzureDevops, to access 'pull request staging code' as git branch name or tag, to be able to push it on Databricks ?

This is a quick stage on AzureDevops to explain my need :

- stage: Unit_test_PR_on_databricks
  condition: and(eq(variables['Build.Reason'], 'PullRequest'), eq(variables['System.PullRequest.TargetBranch'], 'refs/heads/dev'))
  jobs:
    - job: unit_tests
      displayName: "Unit tests"
      pool:
        vmImage: ubuntu-latest
      steps:
      - script: |
          echo "Testing code from PR on Databricks"
          
          # THIS SHOULD UPDATE test repo with
          databricks repos update --path /Repos/test/my_repo --branch "$(branchName)"

          # THIS SHOULD TRIGGER WORKFLOWS WITH UNITTESTS
          databricks jobs run-now --job-id ${job_test_workflow}
        displayName: 'Trigger unit test on Databricks'

I don't want to accept any pull request on [dev/main/release] without having some unit_tests executed and success on Databricks.

Currently, with all documentations, it seems that the only way using Databricks Repos is :

  • accept PR
  • deploy just accepted PR to Databricks and run unittests.
  • If fail, do a new PR with fixes.

With that kind of process, git branches will never be really clean (specially dev), and many 'fix pull request' if any unit tests fail after merging.

Gohmz
  • 1,256
  • 16
  • 31
  • 1
    Is it the Azure DevOps repository, or something like GitHub? There are a bit differences between different Git services implementations that will affect how branch is specified. – Alex Ott Feb 16 '23 at 13:35
  • It's Azure DevOps. I did use your variable name https://github.com/alexott/databricks-nutter-repos-demo/blob/669956322823a88eb99ffb6d4f0182781ad2092e/azure-pipelines.yml#L6 Let's say I do a PR between dev & main. There's a conflict during PR, and I keep some files from main branch.. With your CI, it will deploy & test 'dev' branch, and not 'resolved conflit branch'. – Gohmz Feb 16 '23 at 15:17

0 Answers0