1

I tried running git lfs migrate import --everything --include="*.dll" on a large repository. Before I ran this there were about 70k commits. After running the migration (and expiring reflog and pruning etc) git rev-list --all --count shows around 130k commits. Why are there so many commits added, and what are those commits?

DeCaf
  • 6,026
  • 1
  • 29
  • 51
  • Prune the repo. Count again. – user2864740 Sep 20 '20 at 00:31
  • I did run `git reflog expire --expire-unreachable=now --all` and `git gc --prune=now`. – DeCaf Sep 20 '20 at 00:34
  • It's certainly true that you can't change any existing commit, so `git lfs migrate` is by definition going to have to copy some—probably most or all—of the original commits. This in turn implies that the number of commits should be expected to roughly double, which is what you're seeing. However, the number of *reachable* commits from *branch and tag* names should stay the same: `git rev-list --branches --tags --count` should still be about 70k. Not sure what names `--all` is finding that would find the other ~60k. – torek Sep 20 '20 at 04:12
  • @torek Interesting. You are quite right, running `git rev-list --branches --tags --count` returns the expected number of commits. Curious as to what the remaining "things" that `--all` finds is though. Running `git filter-repo --analyze` also processes about 130k commits, som something here is a little strange. – DeCaf Sep 20 '20 at 09:51
  • 1
    Well, try `git for-each-ref` to dump out all the refs. Perhaps `git lfs migrate` does something similar to Git's own (aging but still there) `git filter-branch`, saving all the original names in a namespace like `refs/original/`. – torek Sep 20 '20 at 10:26
  • Thanks! That helped me find the culprits! Not a problem with git lfs migrate as it turned out, but with some weird commits created by `git tfs` which was used to create the repository from a TFVC repository. – DeCaf Sep 20 '20 at 10:35

2 Answers2

1

Check if the issue 3238 is not the cause in your case.
In short, a tag might still reference an old commit (and all its parents would still be counted by git rev-list)

Search if there are some tags that are still pointing to old OIDs that are suppposed to get migrated by git lfs migrate.
You get such information into the --object-map=mapping_file.map.txt file.
Following script shall be executed within git repository. It will not make any modification in the repository, until you un-echo the git tag command...

MAP_FILE=../mapping_file.map.txt
git for-each-ref | grep tags | while read -r oid type tag; do
        while IFS=, read -r old_oid new_oid; do
                if [[ "$oid" == "$old_oid" ]]; then
                        echo TAG $tag still pointing to old_oid $old_oid instead of $new_oid
                        echo git tag -f $(basename $tag) $new_oid
                fi
        done < $MAP_FILE
done

Note, the mapfile comes from this comment:

Under some specific conditions, which I don’t know, lfs migrate import is not able to move refs (tags, branches) from the old commit to the new created one.
As a consequence of this, "git gc" can’t remove the old commits.
Our database are unfortunately not sharable, but I can try to describe our way to get rid of these old commits.

  1. Create a map file (--object-map=) while running lfs migrate import to get a correlation between old commits and new commits created by lfs.
  2. Collect all commits from your original database and the working lfs database (git rev-list –all)
  3. Identify all so-called double commits. These are commits existing in the original and the working lfs database.
  4. Check these double commits for still existing refs (git show –s) and move every found ref manually or by script to the corresponding new commit (from your created map file) with these git commands: git tag –f, git update-ref.
  5. Run git gc –prune=now and check your commit count in your working lfs database
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • As noted in that same issue, other `.gitattributes` (in your case involving `*.dll`) might also be a problem: https://github.com/git-lfs/git-lfs/issues/3238#issuecomment-419914142 – VonC Sep 20 '20 at 10:30
1

I managed to figure it out with the help of the comments from @torek. As mentioned in a comment above, git rev-list --branches --tags listed the correct number of commits.

The repository was created by using git tfs to convert a TFVC repository to git. Running git for-each-ref listed a bunch refs under refs/remotes/tfs, that did not show up when running git remote -v, due to the fact that these were listed as commits. So probably these referenced a bunch of the old commits that were not rewritten by git lfs migrate, and obviously the refs were not updated by git lfs migrate as sould probably be expected.

Deleting all of these refs using git update-ref -d, and then doing another gc, seems to have fixed the problem and the repository was back to its original number of commits.

DeCaf
  • 6,026
  • 1
  • 29
  • 51