5

Is there any existing tool that is able to export a mercurial repository to a git repository while preserving the commit hashes?

I'm aware of hg-git or fast-export.git, but those create new commits with new hashes (and there doesn't seem to be any option to configure this). We have hundreds of mercurial repositories hosted on Bitbucket with large amount of hooks, download links etc. dependent on exact hashes. Being able to preserve hashes would save us considerable amount of efforts needed to update all external resources.

JWWalker
  • 22,385
  • 6
  • 55
  • 76
Jan
  • 1,905
  • 17
  • 41

1 Answers1

10

It's not possible.

The hash ID of a Git object is a cryptographic checksum of the underlying object data. In the case of a commit object, that's a cryptographic checksum of the string commit, a space, the size in bytes of the rest of the data expressed in decimal, an ASCII NUL, and then the headers, log message text, and trailers.

The hash ID of a Mercurial commit is a cryptographic checksum of an appropriate part of the Mercurial data for that commit (Mercurial's data structures are different so some commit data do not participate in the checksum).

The only known way today to construct a specific hash ID from some known data—as you would have in a Git commit—is to add a "junk" data area, then spend many CPU-years computing hashes with different contents in the junk-data. The team that created shattered used 110 GPU-years of compute-time to find one duplicate hash ID.

torek
  • 448,244
  • 59
  • 642
  • 775
  • But does git relay on this fact in any way? If I'd fork the git source; change the way how it generates hashes (to be able to inject them); create (or import) repo this way, would anything break in such a repo if used from proper git afterwards – Jan Oct 28 '19 at 19:15
  • 1
    Git relies on this in *every* way. (Mercurial relies on the commit hashes for distribution. Git relies on hashes not just for commit *transfer*, but also for commit *existence* and contents: commits contain tree hashes, which contain more hashes, and so on.) – torek Oct 28 '19 at 19:21
  • Hmm - too bad for us :-/. Before I accept. Is there any way to export the list of hashes (ideally just the hashes) in deterministic chronological way - so that we can at least build reliable map of hg hash -> git hash? – Jan Oct 28 '19 at 19:24
  • Note, by the way, that there is an ongoing project to move Git from SHA-1 to one of the SHA-256 varieties. This is a major internal change with a lot of ramifications. Once it's done, Git might be able to support N different hashes, and perhaps you could add a "mercurial hash" that acts as an auxiliary entity. I'd bet this would be pretty difficult though. – torek Oct 28 '19 at 19:24
  • 1
    And even if you could, it wouldn't be a Git repository anymore. It would be a _something else_ repository. GitHub would reject this, for instance, because it would try to verify the hashes that you used, and reject your push because it was corrupt. – Edward Thomson Oct 28 '19 at 22:41
  • 1
    Ah - as for exporting the hashes, use `git rev-list` to find them all. Add `--parents` and build a graph. Mercurial's graph should translate directly into Git's graph, in general, so that would allow you to build a map. (But note that hg tags require an extra commit, so some depends on whether your importer imports the extra commit.) – torek Oct 28 '19 at 23:02
  • 1
    Thanks for the tip. We use tags extensively - so will need to find exporter that export those as separate commits in git (so that easy 1:1 hashes mapping is preserved) – Jan Oct 30 '19 at 16:26