0

I want to put a legacy code of mine on GitHub. In a foolish move, I selected the whole project (in PyCharm) to add to the initial commit; I forgot that this includes data and plot directories that are huge and exceed any file and repo size limit (8+ GB). I had hoped I could just remove the unnecessary files and directories later on, but:

The initial push to GitHub fails, and I seem to not be able to use git revert, git reset and other methods I found here, since there is no previous commit to go back to.

I don't want to risk my code, so I turn to you: how to I either

  1. remove the offending directories and the files therein from the commit but not from my disk, or
  2. ditch this (local) repository and make a new one to connect to GitHub, where I do not include these directories from the get-go?

I looked through many, many answers here, and I just don't seem to find one. Is it that obvious? Thank you all for your help! :)

  • To implement option 2, make a *fresh clone* of the repository (so that the original is independent of the new clone) and then use either `git filter-branch` (obsoleted but still works, just hard to use well) or `git filter-repo` (new, not included in Git yet, much easier to use) to build a *third* repository. Remove the second intermediate one and you have the one you wanted for method 2. Note that when using `git filter-branch`, your second and third repos are all jumbled together in a single `.git` that you have to clean out a bit: another reason to use filter-repo. – torek May 25 '22 at 09:14

1 Answers1

0

Assuming you only have a single commit ("the initial commit"):

git rm --cached files to remove
git commit --amend

This will remove the files only from the commit, but leave them on the file system.

If you already have different commits on top of your initial commit, create a fixup commit removing those files and then rebase interactively:

git rm --cached files to remove
git commit --fixup=commit_id_of_initial_commit
git rebase -i --autosquash --root

I'm not 100% sure if rebasing could remove those files accidentally. To be on the safe side, create a backup of the files first or move them to a different directory. (If they are gone from the FS they should still exist in the object store as blob objects, but restoring them will require some work).

knittl
  • 246,190
  • 53
  • 318
  • 364