0

So, someone pushed a large file to a repo in our Bitbucket (we use Bitbucket Server, so it's hosted by us). We have deleted the file but want to get rid in history too as the repo is now quite large to clone.

We can see how to get rid of the large file in a clone of the repo. We have done that using git-filter-repo.

However, this repo is central to our CI system and we can't move or rename it easily. So, I want to perform the same operation directly on the repo used by the bitbucket server. That is proving tricky. I found where the repo is (thanks to this answer). I logged in to the server and went to $BITBUCKET_HOME/shared/data/repositories/<id> and tried running the git-filter-repo command there but it failed with

Parsed 2203 commits
Required environment variable STASH_HOOK_ADDRESS is missing
Required environment variable STASH_HOOK_ADDRESS is missing
fatal: ref updates aborted by hook
fast-import: dumping crash report to fast_import_crash_22581
Error: fast-import failed; see above.

I can't find anything on this error at all. Can anyone help?

I stopped the bitbucket service and tried again. Same response. I started bitbucket and it wouldn't start up. However, since everything is virtual and we had taken a snapshot first, we could roll back without any harm. But it still leaves the original question of how to run git-filter-repo (or in some other way clean up the history) on the server.

There is an alternative. I can, I think:

  • Create a mirror clone
  • Use git-filter-repo to remove the file
  • Delete the repo on Bitbucket server
  • Create a new empty with the same name on the server
  • Push from our cleaned-up copy up to the server

This cleans up the repo size (which is my main concern - given our CI process clones this repo so much, having it very bloated will be an issue) and I have the history and branches and tags as far as I can see. What I lose is the settings and the history of pull requests, etc. I'd like to keep those if I can - it's really useful to be able to go to an issue in Jira and click the link to even a closed PR and see from the diff exactly what was done. But if I have to choose between fixing the repo size and keeping old PRs then I'll fix the repo size.

Adam
  • 6,539
  • 3
  • 39
  • 65
  • 2
    You should be able to (**force**) push into the branches so that you replace them.... and make sure that everybody is in sync and the old btranches are not used anymore. That is, from a strictly-git perspective. Don't know about other things involved. – eftshift0 Mar 01 '23 at 15:38
  • 1
    Apart from what eftshift0 said, if cloning the repo in CI is the problem, then you should consider switching to a [Blobless, Treeless or shallow clone](https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/). All three would avoid the "huge file in old history" problem, but have different advantages/disadvantages (with blobless being the one that's least likely to cause issue IMO and shallow being the best, if you can life with not having a history). – Joachim Sauer Mar 01 '23 at 16:10
  • 1
    Besides: the linked-to answer contains a fairly strict warning to not do exactly what you're trying to do (i.e. manipulate that repository on-disk without going through BitBucket). A force-push (as "dangerous" as it sounds) is the much safer solution to the problem that you face. – Joachim Sauer Mar 01 '23 at 16:15

1 Answers1

1

We have fixed this issue by running git-filter-repo in the following order:

  1. fresh clone of the repository

    git clone <url_to_the_repo>

  2. checkout branch with a bad file

    git checkout feature/bad_file

  3. run git-filter-repo on that branch. We had to run it with --force flag. The git-filter-repo script should be outside of your repository folder

    python3 ../git-filter-repo --invert-paths --path-match files/bad_file.zip --force

  4. set back the origin to the git config because it was removed by a script

    git remote add origin <url_to_the_repo>

  5. force push to your branch

    git push --set-upstream origin feature/bad_file --force

After that bad file was removed from history and the repository size decreased, commit hashes also were changed on that branch.

Better to make a snapshot of the server before doing this operation

Teodor Mysko
  • 148
  • 7