2

Summary: I'm using git clone with --reference to a repository which has all the appropriate files but not the commits and I'm expecting it to save network bandwidth and disk space. It doesn't.

I'm converting a repository from SVN. I've done a

cd DIR1; git svn clone $REPO 

I then set up subgit (very nice, BTW) for $REPO. Subgit creates completely different commits because the commit messages are different but the files are all the same.

I then do a:

git clone --reference DIR1 $SUBGITREPO DIR2

I'm expecting it to fetch all the commit objects but reference the files and directories from DIR1. It doesn't do that -- it transfers the full files into DIR2.

After checkout I have used git ls-tree to verify that yes, the SHA1 of the files are the same in DIR1 and DIR2.

So, why isn't git doing what I expect, and how can I make it do so?

It's not that big of a deal for me to just make a new clone, but folks across the Pacific would love to have some network savings...

TIA

Dewey Sasser
  • 379
  • 2
  • 8

2 Answers2

1

The --reference flag to git serves to share git data (file contents under version control, trees, commits). What the work space (i.e., the "visible files") in the directory contain (or if they even exist) is completely irrelevant.

vonbrand
  • 11,412
  • 8
  • 32
  • 52
  • That's actually my point: The git repository has all the appropriate files. The working tree *is* irrelevant. I have verified these files by looking at the output of git ls-tree -- i.e. seeing that the SHA1s referenced by the commits are the same. – Dewey Sasser Apr 07 '13 at 12:28
  • From more RTFM I have come to the conclusion that git spiders the tree of commits and if there are no common commits it assumes there are no common files -- not at all the case here. The question remains: is there any way to accelerate the checkout given that all the git objects referring to files/directories are present? – Dewey Sasser Apr 07 '13 at 12:29
0

is there any way to accelerate the checkout given that all the git objects referring to files/directories are present?

Check if Git 2.23 (Q3 2019) improves the issue and its performance, for the tips of refs from the alternate object store can be used as starting point for reachability computation now.

See commit 39b44ba, commit 709dfa6 (01 Jul 2019) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 68e65de, 19 Jul 2019)

check_everything_connected: assume alternate ref tips are valid

When we receive a remote ref update to sha1 "X", we want to check that we have all of the objects needed by "X".

We can assume that our repository is not currently corrupted, and therefore if we have a ref pointing at "Y", we have all of its objects.
So we can stop our traversal from "X" as soon as we hit "Y".

If we make the same non-corruption assumption about any repositories we use to store alternates, then we can also use their ref tips to shorten the traversal.

This is especially useful when cloning with "--reference", as we otherwise do not have any local refs to check against, and have to traverse the whole history, even though the other side may have sent us few or no objects.

Here are results for the included perf test (which shows off more or less the maximal savings, getting one new commit and sharing the whole history):

Test                        HEAD^             HEAD
--------------------------------------------------------------------
[on git.git]
5600.3: clone --reference   2.94(2.86+0.08)   0.09(0.08+0.01) -96.9%

[on linux.git]
5600.3: clone --reference   45.74(45.34+0.41) 0.36(0.30+0.08) -99.2%
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250