1

I am developing my code in a databricks workspace. Using integration with Repos, I use Azure DevOps to version control my code.

I would like to use Azure Pipelines to deploy my code to a new test/production environment. To copy the files to the new environment, I use the databricks command line interface. I run (after databricks-cli configuration)

git checkout main
databricks workspace import_dir . /test

to copy the files from the VM to the new databricks workspace. However, the import_dir statement only copies files ending on certain extensions (for example, not .txt files, so my requirements.txt is not copied) and removes the extensions, converting everything to notebooks.

This is quite problematic: I use relative imports to other python files, yet these files are converted to notebooks so the import doesn't work anymore. Is there any way to circumvent the removal of the extensions? And how can I copy all files instead of only the ones having certain extensions?

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Larsq
  • 315
  • 2
  • 12

1 Answers1

4

If you're using databricks workspace import_dir then it's importing data into a Databricks Workspace that has support only for source code in Scala/Python/R. Support for arbitrary files exists only for Databricks Repos that is a separate entity inside the Databricks, a bit different from Databricks Workspace.

If you want to promote code changes into UAT/production, then you can just continue to use Repos - create corresponding repositories in that environments (for example, using databricks repos create), and then promote changes using the databricks repos update command. You can find detailed instructions in the following demo that shows how to do CI/CD on notebooks in the Repos, and also how to promote code to production.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132