Mercurial: exposing a subset of history to the public

Question

I'd like to publish a subset of an existing private repository into the public. Given two repositories, private and public, I want to do the following:

private contains the project's entire history, including confidential information.
public should contain a subset of private's history, minus the confidential information.

I can generate a new branch in private that takes the latest changeset and strips away all confidential information, but I don't want to share ancestors of this branch with public.

Question: How do I strip history from public while keeping the repositories related? Meaning, I need to be able to hg pull from public into private.

Update:

What makes this question different from https://stackoverflow.com/a/5516141/14731 is that I need to hide existing ancestors from public (versus hiding new heads).
https://stackoverflow.com/a/4034084/14731 might work, but I'm wondering if there is a better approach than merging against a disjoint head.

score 2 · Accepted Answer · edited May 23 '17 at 12:05

Upon further reflection, I think it makes sense that https://stackoverflow.com/a/4034084/14731 produces a disjoint head since the remote changeset really does represent a head without ancestors. On the upside, this approach has a minimal diskspace cost. The files are not duplicated on disk. You only end up paying a bit (85k on my end) for the extra metadata.

Here is how to implement this approach:

hg archive to extract the latest changeset from private's sanitized branch.
hg init to create a new repository from this changeset.
hg pull [private] --force to pull public (an unrelated repository) into private as a new disjoint branch.

At this point you have two options: Merging the disjoint head into private's sanitized branch, or not.

Option 1: Merged head

Advantages
- The private repository can see a historical link between private and public.
Disadvantages
- You can not push changes from private to public because doing so will push the ancestors you worked so hard to exclude. Why? hg push cannot exclude ancestors of a merge.
- You need to interact with private directly in order to modify the sanitized branch.
- Contributing patches from private to public becomes more difficult (since you cannot make use of the history metadata directly).

Option 2: Unmerged Head

Advantages
- Ability to push changes from private to public without revealing private changesets. You can do this using hg push -b disjointBranch.
Disadvantages
- You lose the historical link between public and its ancestors in `private.

I'm still looking for a more elegant solution. If you have a better answer, please post it.

score 1 · Answer 2 · answered Jul 11 '13 at 07:17

If you real task is really "hide private data", not "show only small subset of history" (see difference) you can

Activate and use MQ Extension
Convert all changesets, which change non-public data, into mq-patches
Eliminate from patches all edits, non-related to private data handling
Replace in code all occurences of private data by some keywords
Edit related patches in queue (which now must replace keywords by values)
Push "polished" private-repo to public (with all mq-patches previously unapplied)

In order to have in future "safe push" add alias to private repo (which, when used, push only changesets without /if any/ applied patches), smth. like.

[alias]
spush = hg qpop -a && hg push

or, in more modern way, for Mercurial, which have support for Phases, always have mq-patches in secret phase (i.e unpublishable) and don't worry about applied|unapplied state before push

[mq]
secret = True

in private repo's .hgrc

Mercurial: exposing a subset of history to the public

2 Answers2

Option 1: Merged head

Option 2: Unmerged Head