1

I recently removed several large files from a repository using BFG --delete-files , and the output appeared to be what I expected. The correct files and sizes were reported as deleted from the repo and the local size reflects these removals.

However, when I upload and compare to master, it reports that there is a huge difference and it affects several hundred commits. I'm not sure what to make of this or how to understand it; it is too much to go through. I understand removing the files will restructure the repo, but how can I be sure what was intended actually happened in the diff?

avs099
  • 10,937
  • 6
  • 60
  • 110
user8897013
  • 443
  • 4
  • 15
  • How far back did these now deleted files go in terms of commits? – Tim Biegeleisen Dec 13 '17 at 02:22
  • They go far back, probably near the beginning and scattered from there till now. I see there is a commit tree-dirt history graph shown each time I removed a file with BFG, which shows DDD's and mmm's starting at the beginning for the first file and progressing to the right after each file. – user8897013 Dec 13 '17 at 02:45
  • I've never used BFG but I assume that it's rewriting history along the way. This would explain why you see so many commits as having changed. – Tim Biegeleisen Dec 13 '17 at 02:47
  • Is there any way to make sense of what it is doing? This is a critical repo and I can't merge a bunch of diffs I can't explain. – user8897013 Dec 13 '17 at 02:53
  • You rewrote history AFAIK. There is no nice way to remove old large files which go back a long time. – Tim Biegeleisen Dec 13 '17 at 02:54
  • Yes, but those files should not have been independent of any other files. The diff should just be nothing, since the files had been untracked a while ago, but still remained in the version history. I don't see why it would affect so many other files and other unrelated changes. – user8897013 Dec 13 '17 at 02:56
  • Were the files referenced by anything else? Also realize that Git is a repository based version control system. A commit logically contains a snapshot of _every_ file. – Tim Biegeleisen Dec 13 '17 at 02:59
  • Probably yes. The files would have been run. – user8897013 Dec 13 '17 at 04:40
  • @user8897013 Have you get the answer which helps you solve the problem? If yes, you can mark the answer. And it will also benefit others who have similar question. – Marina Liu Jan 04 '18 at 08:09

2 Answers2

0

It depends on the nature of the diff seen in all those files.

For instance, if they are EOL differences (end-of-lines), that could means the BFG commit-rewritten process was done locally with a config core.autocrlf which might have change eol for all the files (in addition of removing some of them).
Once pushed, all the other files would be shown as "different".

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • I've gone through the diff and its definitely code changes that aren't even the latest versions. It looks like diffs for commits randomly through the history, as in commit 212 vs 213 or 500 commits. ASIDE: Does this have anything to do with removing a file (and possibly the commit as well) and then having to recursively re-stack all the commits back on top (done once for each individual file removed)? Is it just going to show every single diff that came up as a result of modifying the commit because every commit afterwards is involved in the re-stacking process? – user8897013 Dec 13 '17 at 20:05
  • @user8897013 there should be no "stacking" involve. A BFG involves a "git push --force" at the end, replacing one history by another. – VonC Dec 13 '17 at 21:56
0

It’s caused by java -jar bfg.war --delete-files filename will delete the the specified file from the whole commit histories for all branches, but you only push one branch to remote.

Assume the commit history as below before using BFG to delete files as below:

…---A---B---C---D  master, origin/master
         \
          E---F  mybranch, origin/mybranch

When you compare master branch with mybranch, the related commits are C, D, E and F.

And assume the delete-files is test.txt, and it only exist in commit B and E. When you execute

java -jar bfg.war --delete-files test.txt

The commit history will be:

…---A---B'---C'---D'  master, origin/master
         \
          E'---F'  mybranch, origin/mybranch

Note: it rewrite the commits not only for local branches (master and mybranch), but also re-point the tracking branches (origin/master and origin/mybranch).

If you do git fetch after that, you will find the local branches with their tracking branches are diverged:

        E---F    origin/mybranch
       /
      B---C---D  origin/master
     /     
…---A---B'---C'---D'  master
         \
          E'---F'  mybranch

While if you only force push mybranch to remote repo (not force push master branch), the commit history on remote repo will be:

      B---C---D  master
     /     
…---A---B'---E'---F'  mybranch

So when you compare mybranch with master branch again, the relate commits will contains B, C, D, B', E' and F'.

And if you also force push local master btanch to remote, the count of related commits should be the same when you compared when using BFG.

Marina Liu
  • 36,876
  • 5
  • 61
  • 74
  • Sorry for not checking back in on this. I'm not entirely sure what happened, but someone else ran the utility but with a different file size parameter, and none of the diffs I saw were there. This was very strange, so I'm not sure if it was me or not. – user8897013 Apr 04 '18 at 22:26