1

I'd like to publish a subset of an existing private repository into the public. Given two repositories, private and public, I want to do the following:

  • private contains the project's entire history, including confidential information.
  • public should contain a subset of private's history, minus the confidential information.

I can generate a new branch in private that takes the latest changeset and strips away all confidential information, but I don't want to share ancestors of this branch with public.

Question: How do I strip history from public while keeping the repositories related? Meaning, I need to be able to hg pull from public into private.

Update:

  1. What makes this question different from https://stackoverflow.com/a/5516141/14731 is that I need to hide existing ancestors from public (versus hiding new heads).
  2. https://stackoverflow.com/a/4034084/14731 might work, but I'm wondering if there is a better approach than merging against a disjoint head.
Community
  • 1
  • 1
Gili
  • 86,244
  • 97
  • 390
  • 689

2 Answers2

2

Upon further reflection, I think it makes sense that https://stackoverflow.com/a/4034084/14731 produces a disjoint head since the remote changeset really does represent a head without ancestors. On the upside, this approach has a minimal diskspace cost. The files are not duplicated on disk. You only end up paying a bit (85k on my end) for the extra metadata.

Here is how to implement this approach:

  1. hg archive to extract the latest changeset from private's sanitized branch.
  2. hg init to create a new repository from this changeset.
  3. hg pull [private] --force to pull public (an unrelated repository) into private as a new disjoint branch.

At this point you have two options: Merging the disjoint head into private's sanitized branch, or not.

Option 1: Merged head

  • Advantages
    • The private repository can see a historical link between private and public.
  • Disadvantages
    • You can not push changes from private to public because doing so will push the ancestors you worked so hard to exclude. Why? hg push cannot exclude ancestors of a merge.
    • You need to interact with private directly in order to modify the sanitized branch.
    • Contributing patches from private to public becomes more difficult (since you cannot make use of the history metadata directly).

Option 2: Unmerged Head

  • Advantages
    • Ability to push changes from private to public without revealing private changesets. You can do this using hg push -b disjointBranch.
  • Disadvantages
    • You lose the historical link between public and its ancestors in `private.

I'm still looking for a more elegant solution. If you have a better answer, please post it.

Community
  • 1
  • 1
Gili
  • 86,244
  • 97
  • 390
  • 689
1

If you real task is really "hide private data", not "show only small subset of history" (see difference) you can

  • Activate and use MQ Extension
  • Convert all changesets, which change non-public data, into mq-patches
  • Eliminate from patches all edits, non-related to private data handling
  • Replace in code all occurences of private data by some keywords
  • Edit related patches in queue (which now must replace keywords by values)
  • Push "polished" private-repo to public (with all mq-patches previously unapplied)

In order to have in future "safe push" add alias to private repo (which, when used, push only changesets without /if any/ applied patches), smth. like.

[alias]
spush = hg qpop -a && hg push

or, in more modern way, for Mercurial, which have support for Phases, always have mq-patches in secret phase (i.e unpublishable) and don't worry about applied|unapplied state before push

[mq]
secret = True

in private repo's .hgrc

Lazy Badger
  • 94,711
  • 9
  • 78
  • 110