2

This are the steps I did:

  1. Created an empty folder.

  2. Mirrored my repository using:

    git clone --mirror git@bitbucket.org:somespace/myrepo.git
    
  3. Got a list of 10 largest file using the following command:

    git rev-list --objects --all \
    | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \
    | sed -n 's/^blob //p' \
    | sort --numeric-sort --key=2 \
    | tail -n 10 \
    | cut -c 1-12,41- \
    | $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
    
  4. Say the name of the largest file came as largestFile.log

  5. Then I ran bfg as below:

     java -jar bfg-1.14.0.jar --delete-files 'largestFile.log'
    
  6. Output of above command shows the file to be successfully deleted:

    Deleted files
    -------------
    
     Filename                       Git id
     ------------------------------------------------
     largestFile 2015-05-18.log | bbaaa106 (1.3 GB)
    
  7. Finally as advised by the output of step 6 above, I next ran this:

    git reflog expire --expire=now --all && git gc --prune=now --aggressive
    
  8. That also completed successfully.

Now at this point, before pushing, I want to ensure that the file was indeed deleted. So I re-run the command from step 3 above. But the output of that still shows largestFile.log in the list.

What am I doing wrong? Or what am I missing here?

Can someone please explain or guide me.

Thanks!

Vicky
  • 16,679
  • 54
  • 139
  • 232
  • It sounds like the blob object is still in the repository, but not referenced by anything. It won't be part of any further commits going forward, but I'm not sure any of this will remove the object from the remote repository. – chepner Jun 08 '21 at 18:02
  • @chepner yes. but I want to ensure its gone locally before doing a push to remote. – Vicky Jun 08 '21 at 18:03
  • When you push a branch, you are only pushing the transitive closure of objects accessible from the branch head. The stray blob isn't, so it won't be part of the push. – chepner Jun 08 '21 at 18:05
  • You might want to try using `git gc` to see if that deletes the orphaned object. – chepner Jun 08 '21 at 18:15
  • `git gc` says nothing new to pack. – Vicky Jun 09 '21 at 00:58

2 Answers2

2

The fine manual says that without the --no-blob-protection option, the HEAD commit is left unchanged. Is that your issue?

By default the BFG doesn't modify the contents of your latest commit on your master (or 'HEAD') branch, even though it will clean all the commits before it.

That's because your latest commit is likely to be the one that you deploy to production, and a simple deletion of a private credential or a big file is quite likely to result in broken code that no longer has the hard-coded data it expects - you need to fix that, the BFG can't do it for you. Once you've committed your changes- and your latest commit is clean with none of the undesired data in it - you can run the BFG to perform it's simple deletion operations over all your historical commits

...

If you want to turn off the protection (in general, not recommended) >you can use the --no-blob-protection flag:

https://rtyley.github.io/bfg-repo-cleaner/

Mort
  • 3,379
  • 1
  • 25
  • 40
  • the file was not part of head commit if that is what you meant. infact the file and the directory it was in is long deleted and committed. So when I checkout a branch out of repository and try crawling to that path, its not there. So all I am meaning to do is clean up and reduce size of my repository. – Vicky Jun 09 '21 at 00:55
  • 1
    I tried with `--no-blob-protection` as well.. still same result.. I can still see the files listed in the list of big files as mentioned in my question. Also, the size of the repository is same as before.. – Vicky Jun 11 '21 at 00:55
  • Weird. Sorry, I have no further ideas. – Mort Jun 11 '21 at 02:19
  • @Vicky Did you ever find a solution to this? Also having a fresh clone here, deleting a simple file with `--no-blob-protection` states it's deleted, but even after running the prune command, it's still there... – Ray May 22 '22 at 12:40
0

I think you need to remove the objects by running the following:

git reflog expire --expire=now --all && git gc --prune=now --aggressive

This is the only way I am able to shrink the size of my repo and pushing and pulling after this operation results in the upstream repo getting smaller as well. Be careful though, after running this command, there is no way to rewrite history using the git reflog command.

Anmol Jagetia
  • 183
  • 2
  • 11