6

I removed some unreachable and dangling commits in my local repo using

git fsck --unreachable --dangling --no-reflogs
git reflog expire --expire=now --all
git gc --prune=now

But I find the removed commits still available on the origin (GitHub, to be precise).

I tried git push --force but it doesn't synchronize the changes to origin. How do I force sync the changes to origin (have the unreachable/dangling commits removed from remote as well)?

This is a similar question with no answer:

Scope of gc prune and git reflog expire, and related config

Community
  • 1
  • 1
ADTC
  • 8,999
  • 5
  • 68
  • 93

1 Answers1

8

Short form

You can't dictate how the remote stores its data from the client.

Longer form

First, I think the place to start is to understand that your local repository is not the same as the remote one. git fsck and git gc operate on the local repository--which you already knew, since you're asking the question.

Second, Git works by transferring objects. The trick here is that it only talks about reachable objects over the wire. Meaning, there must be a path from the a reference (a branch or a tag) to the object in the history somehow. If the object being referred to is not reachable, Git will refuse to transfer it to the client, even if it's in the object database. The flip side of this is that anything that you do locally that doesn't involve modifying or updating a reference, can't be communicated between the local and remote repositories. You can't say "sync my local object database layout to the remote". You can only say "make the reachable objects between my local and remote the same."

Finally, how things get represented in GitHub, and whether or not objects get pruned eventually, is entirely up to GitHub. Zach Holman has given a talk on some of the things happening behind the scenes. I imagine they run something in the background to prune dangling objects at some point, but from a remote access standpoint, it really doesn't matter--people cannot access the unreferenced objects. The only issue left is size. I know they're doing some sort of pruning because I've trimmed repositories in the past and decreased their size (you can check this by looking at the size member using the api call. You can try this as an example: https://api.github.com/repos/jszakmeister/vimfiles).

If your goal is to shrink the repository size because you checked in objects that are too large, take a look at the Removing sensitive data page from GitHub's help section. It applies equally to large files that you want to permanently remove (simply removing them via a commit doesn't remove them entirely from history).

If the goal is to reduce repository size via compacting and removing dangling objects, GitHub is already doing there own thing, and you don't really have much control over how that's done. They go to great lengths to keep it small, fast, and efficient though.

John Szakmeister
  • 44,691
  • 9
  • 89
  • 79
  • 1
    Great info here. In a gist, I cannot force-prune dangling commits in remote. Anyway I contacted GitHub support and they kindly ran a `gc` on their side to remove them. I suppose this is what a bare repo maintainer will need to do upon request if the commits need to be removed on the remote. – ADTC Sep 18 '14 at 11:14
  • "people cannot access the unreferenced objects." this does not seem to be an accurate description of the situation. If I run `git reflog expire --expire=now --all` locally, it will remove log entries referring to dangling references (under `.git/logs`). If I `git clone` a fresh copy of the repo, those dangling references are still present. So it is most definitely possible for people to access data such as usernames and emails associated with the unreferenced objects stored in a repo on GitHub. The commits might be gone, but potentially sensitive information about them remains. – user5359531 Apr 04 '23 at 16:01