0

We have two different teams, each in its own location, working with Mercurial, each location having a reference repository. Each location has a access to an enterprise network, but the two networks cannot be directly connected (trust me, we asked): we can only exchange files. We would like to be able to sync the two locations regularly so that the work can be shared through the respective reference repositories.

The requirements:

  • Exchanges must be allowed in either direction.
  • We need to be able to work on some branches simultaneously from both sides, or at least recover from cases where this happened, even if we expect to work on separate branches most of the time. This implies an integration step may be necessary to handle the divergent work.
  • Most tracking must happen automatically, such that manual intervention, and the risk of manipulation errors from same, is minimized (not that they would be fatal, but best to avoid finger-pointing: trust is limited). We have many branches, handling them one by one is a non-starter.
  • The reference repositories may only be manipulated through remote push/pull and if necessary light administrative operations, both because they are under IT control, and because we want them to be always consistent, i.e. integration is done first, and only then are the changes from the other side published, together with the integration, on the local reference repository.
  • We cannot send the whole repository (even tar-gzipped) each time: it's not only a bit big per se, but also all packages successively sent are kept in records because this is part of contractual commitments, and having N copies of the repository in there is quickly going to become unsustainable.
  • All the necessary information must be stored in a locally central place (same constraints as the reference repositories), so that any developer may perform the syncing steps, without depending on information stored in the local repository(ies) of a particular developer.

Non-requirements:

  • Handling more than two disconnected sites. Two is going to be challenging enough already.
  • Nightly processing. Exchanges are going to be triggered and handled manually.
  • Limited number or complexity of commands. If many intricate commands are necessary, so be it, we can always hide that complexity in a script.
  • Crossing the offline syncs. That always means trouble, just like with streams. Ergo, we can assume offline sync operations are totally ordered, regardless of their directions, taking turns if necessary.
  • Branch management details, etc. That is our internal business.
  • Support of Mercurial bookmarks. We only briefly experimented with them before abandoning them.
Pierre Lebeaupin
  • 1,103
  • 8
  • 20
  • The context, for what it's worth: our company sold all rights to a product to another company, and for a transition period (measured in years because of the industry) we are to help support the product and train the engineers of that company, which includes simultaneous development during that transition period. However, that sale is not enough to justify integrating the two enterprise networks, and trust is limited anyway: for various reasons that are way above my pay grade we are to keep each other at arm's length. – Pierre Lebeaupin Mar 27 '19 at 22:35
  • Also, everyone involved now wants to switch to git, and this is harder to do, but still possible: https://stackoverflow.com/questions/55319470/offline-syncing-of-locally-central-git-repositories/55319471#55319471 – Pierre Lebeaupin Mar 27 '19 at 22:36
  • Did you consider using a private third-party HG host that both parties could access and use as a central point of syncronization? This wouldn't require any direct relationship between the two enterprise's networks. – StayOnTarget Mar 28 '19 at 11:29

1 Answers1

0

Mercurial makes that easy with its bundles; tracking is best performed by having a clone of the repository that is at the last known state of the remote repository, stored at $SITE_B_IMAGE_URL. Let our location be called site-a and the remote location be called site-b.

  • Generating a bundle to send to the remote location:

    1. ~/work$> hg clone $LOCAL_REF_URL bundler
    2. ~/work$> cd bundler
    3. ~/work/bundler$> hg bundle ../bundle-site-a-$(date +%Y-%m-%d) $SITE_B_IMAGE_URL

    The bundler work repository may now be discarded.

  • Updating the remote tracking repository, when the remote location has confirmed having been able to unbundle the contents of a bundle sent to them:

    1. ~/work$> hg clone $SITE_B_IMAGE_URL remote-tracking
    2. ~/work$> cd remote-tracking
    3. ~/work/remote-tracking$> hg push -R ../bundle-site-a

    The remote-tracking work repository may now be discarded.

  • Integrating a bundle from the remote location:

    First, follow the steps to update the remote tracking repository, this time taking the bundle you received, instead of the one you previously sent.

    1. ~/work$> hg clone $LOCAL_REF_URL bundle-integration
    2. ~/work$> cd bundle-integration
    3. ~/work/bundle-integration$> hg unbundle ../bundle-site-b
    4. At this point hg heads will tell the heads, including the cases where there is more than one head for a given branch, and therefore where work in needed to reduce to one head per branch, so insert here the work necessary to merge the extraneous branch heads.
    5. If hg push fully succeeds, return; we are done
    6. ~/work/bundle-integration$> hg pull
    7. Take into account work done on your location that happened while you were busy performing the previous steps
    8. goto 5

    The bundle-integration work repository may now be discarded.

    Notes: While you can use the bundle as an overlay with -R, that won't perform the integration; that can only be done by unbundling first.

Pierre Lebeaupin
  • 1,103
  • 8
  • 20