0

Are there any best-practices how to organize your project folders so that the CI/CD pipline remains simple?

Here, the following structure is used, which seems to be quite complex:

project
│   README.md
│   azure-pipelines.yml   
│   config.json
│   .gitignore
└─── package1
│       │   __init__.py
│       │   setup.py
│       │   README.md
│       │   file.py
│       └── submodule
│       │      │   file.py
│       │      │   file_test.py     
│       └── requirements
│       │      │   common.txt
│       │      │   dev.txt
│       └─  notebooks
│              │   notebook1.txt
│              │   notebook2.txt
└─── package2
|       │   ...
└─── ci_cd_scripts
        │   requirements.py
        │   script1.py
        │   script2.py
        │   ...

Here, the following structure is suggested:

.
├── .dbx
│   └── project.json
├── .github
│   └── workflows
│       ├── onpush.yml
│       └── onrelease.yml
├── .gitignore
├── README.md
├── conf
│   ├── deployment.json
│   └── test
│       └── sample.json
├── pytest.ini
├── sample_project
│   ├── __init__.py
│   ├── common.py
│   └── jobs
│       ├── __init__.py
│       └── sample
│           ├── __init__.py
│           └── entrypoint.py
├── setup.py
├── tests
│   ├── integration
│   │   └── sample_test.py
│   └── unit
│       └── sample_test.py
└── unit-requirements.txt

In concrete, I want to know:

  • Should I use one repo for all repositories and notebooks (such as suggested in the first approach) or should I create one repo per library (which makes the CI/CD more effortfull as there might be dependencies between the packages)
  • With both suggested folder structures it is unclear for me where to place my notebooks that are not related to any specific package (e.g. notebooks that contain my business logic and use the package)?
  • Is there a well-established folder structure?
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
user3579222
  • 1,103
  • 11
  • 28

1 Answers1

1

The Databricks had a repository with project templates to be used with Databricks (link) but now it has been archived and the template creation is part of dbx tool - maybe these two links will be useful for you:

Bartosz Gajda
  • 984
  • 6
  • 14
  • 1
    thanks, I checked out the dbx init structure for azure-devops: it is creating a single-project structure. How would you handle multiple package dependencies,... – user3579222 Dec 06 '22 at 19:50