We have two different teams, each in its own location, working with git, each location having a reference repository. Each location has a access to an enterprise network, but the two networks cannot be directly connected (trust me, we asked): we can only exchange files. We would like to be able to sync the two locations regularly so that the work can be shared through the respective reference repositories.
The requirements:
- Exchanges must be allowed in either direction.
- We need to be able to work on some branches simultaneously from both sides, or at least recover from cases where this happened, even if we expect to work on separate branches most of the time. This implies an integration step may be necessary to handle the divergent work.
- Most tracking must happen automatically, such that manual intervention, and the risk of manipulation errors from same, is minimized (not that they would be fatal, but best to avoid finger-pointing: trust is limited). In particular, the single, moving tag example used in the git-bundle man page is laughable, as that will not scale even to a limited number of branches (we have dozens).
- The reference repositories may only be manipulated through remote push/pull and if necessary light administrative operations, both because they are under IT control, and because we want them to be always consistent, i.e. integration is done first, and only then are the changes from the other side published, together with the integration, on the local reference repository.
- We cannot send the whole repository (even tar-gzipped) each time: it's not only a bit big per se, but also all packages successively sent are kept in records because this is part of contractual commitments, and having N copies of the repository in there is quickly going to become unsustainable.
- All the necessary information must be stored in the local reference repository, so that any developer may perform the syncing steps, without depending on information stored in the local repository(ies) of a particular developer.
- Work with git, not against it, at least to as much extent as it is possible to do so. The weirder the workflow is, the more likely it is going to break because of a change in git or other unexpected condition.
Non-requirements:
- Handling more than two disconnected sites. Two is going to be challenging enough already.
- Nightly processing. Exchanges are going to be triggered and handled manually.
- Limited number or complexity of commands. If many intricate commands are necessary, so be it, we can always hide that complexity in a script.
- Crossing the offline syncs. That always means trouble, just like with streams. Ergo, we can assume offline sync operations are totally ordered, regardless of their directions, taking turns if necessary.
- Branch management details, etc. That is our internal business.