Currently, we are using Databricks Repo on Databricks with Azure Devops as .git.
Azure Devops is configure to deploy new update on each branch [dev,main,release] using
databricks repos update --path /Repos/dev/my_repo --branch dev
databricks repos update --path /Repos/staging/my_repo --branch main
databricks repos update --path /Repos/prod/my_repo --branch release
It's working well.
We are having issues to define some proper process to unit_test our notebook, specially 'staging notebooks from Azure DevOps pull request', before accepts it.
Since we are using only notebook, we have to push our "staging pull request code' in Databricks, then run unit_tests.
This is the CI workflow I would love to implement:
Create a repo for "unit_test" in /Repos/test/my_repo
User create a feature_branch and create pull request to dev.
During Azure Devops CI, after resolving conflicts, update '/Repos/test/my_repo' with fresh merged code in AzureDevops Pull request using
databricks repos update --path /Repos/test/my_repo --branch 'merged branch from pull request'
- then, trigger Databricks Worflows associated to that Repo /Repos/test/my_repo which run unit tests.
- If unit-tests are OK from Databricks Workflows execution, then accept Pull Request on AzureDevops, and do an update of dev branch.
I did look at AlexOtt example (https://github.com/alexott/databricks-nutter-repos-demo/blob/master/azure-pipelines.yml) which deploy 'sourceBranch' to test, and not 'sourceBranch' + 'targetBranch'
Is there a way, in AzureDevops, to access 'pull request staging code' as git branch name or tag, to be able to push it on Databricks ?
This is a quick stage on AzureDevops to explain my need :
- stage: Unit_test_PR_on_databricks
condition: and(eq(variables['Build.Reason'], 'PullRequest'), eq(variables['System.PullRequest.TargetBranch'], 'refs/heads/dev'))
jobs:
- job: unit_tests
displayName: "Unit tests"
pool:
vmImage: ubuntu-latest
steps:
- script: |
echo "Testing code from PR on Databricks"
# THIS SHOULD UPDATE test repo with
databricks repos update --path /Repos/test/my_repo --branch "$(branchName)"
# THIS SHOULD TRIGGER WORKFLOWS WITH UNITTESTS
databricks jobs run-now --job-id ${job_test_workflow}
displayName: 'Trigger unit test on Databricks'
I don't want to accept any pull request on [dev/main/release] without having some unit_tests executed and success on Databricks.
Currently, with all documentations, it seems that the only way using Databricks Repos is :
- accept PR
- deploy just accepted PR to Databricks and run unittests.
- If fail, do a new PR with fixes.
With that kind of process, git branches will never be really clean (specially dev), and many 'fix pull request' if any unit tests fail after merging.