Git: How does git decide which files to delete during checkouts?

Question

One thing about Git that I'm trying to understand is how it deals with files which aren't tracked, or which weren't tracked but now are committed.

Here's an example:

echo "one" > one.txt
git add one.txt
git commit -m "#1"

So, "one.txt" is in the first commit. Now, I'll create two more files, but only add one of them (and I'll tag this next commit as "#2" so that we can return to it):

echo "two" > two.txt
echo "three" > three.txt
git add two.txt
git commit -m "#2"
git tag "#2"

So, "two.txt" is in the 2nd commit, and "three.txt" is just this extra file, hanging around. Then, I checkout HEAD~ and, sure enough, Git removes the file "two.txt", because that didn't exist back then...

git checkout HEAD~
ls
___ one.txt three.txt

Okay.. back to the end of the branch, and only add the third file. So, we're only tracking it. Then let's step backward one commit...

git checkout "#2"
git add three.txt
git checkout HEAD~
ls
___ one.txt three.txt

Ummm... okay... so just tracking a file isn't enough to have Git manage whether it's there or not. So, we go to the end of the branch, again, and commit, then back up (twice, this time, to get back to commit #1) and what do we see?

git checkout "#2"
git commit -m "#3"
git checkout HEAD~~
ls
___ one.txt

This time, Git has deleted the file "three.txt". So, my question is: Can someone describe just how Git decided to do that? It doesn't seem to be simply whether a filename appears in any tree object in the respository, because I can create a new three.txt (a filename which is being tracked and has a committed version in the respository, and then do more checkouts, and Git leaves it alone, again, like before.

Can someone explain how Git decides what is okay to delete when doing checkouts?

score 0 · Answer 1 · answered Mar 06 '14 at 19:28

Rather than come up with my own explanation, I'll direct you to the excellent Pro Git book, which is free :). That link is to the relevant section, and explains very well the various states.

Your explanation covers the way Git will care about files pretty comprehensively. Forgetting about resets for a minute, Git will apply changes to the working directory to take it from state X to state Y (using the recorded changesets). If you have a file which you haven't committed, or have altered since it was last committed, Git sees this as a later change, so won't undo work which isn't committed, even if the file didn't exist in the previous commit.

This is a pretty sane strategy because it lessens the risk of losing uncommitted work.

If you're wanting to remove changes from your working directory and go back to the clean state (of the HEAD commit) you can do git checkout -- . or replace . with the file/directory of your choosing.

score 0 · Answer 2 · answered Mar 06 '14 at 20:35

One key to understanding this is that git add merely puts the file into the index; that doing a git checkout compares the trees for the "from" and "to" commits (if path P exists in "from" but not in "to", path P is desired-to-be-removed); and that checkout writes through the index into the working directory (clobbering index-only changes). The other is the general principle of "don't overwrite-or-remove anything that is not committed".

Let's take the particular case in question. You were "on" commit #2 with an untracked file three.txt. You then git add three.txt. It's now in the index, but not committed, so it is (still) not in the tree for the current (HEAD) commit, which is the one tagged #2.

Now you ask git to check out commit "#1". Compare the trees: HEAD resolves "#2", which has files one.txt and two.txt. HEAD~ resolves to "#1", which has one.txt but not two.txt. The recipe for converting the working tree is therefore "remove two.txt, and if one.txt differs, replace it." (And, removal entails removing the index entry as well as the work-tree copy, while replacing entails writing through the index.)

The "verify no work will be clobbered" step then has to check:

What happens to the index versions of two.txt and maybe one.txt?
What happens to the work-tree versions of same?

While three.txt is in the index now, there is no need to remove it or change the index version, so it simply remains in the index now as "ready to be added to a new commit".

Now (with HEAD pointing to #1) you ask to git checkout #2. Git has to repeat the above, but this time the tree comparison result is "add two.txt, and maybe replace one.txt". The verify-no-clobbering step checks that these are OK (they are) and git does the checkout. The index entry for three.txt still remains as "ready to be added to a new commit".

Once you do the git commit -m '#3', git writes the index into a tree (this is relatively easy, much easier than scanning the work directory directly; in effect, the index is a fancied up, intermediate form kind of halfway between a git repo tree entry and a working tree) and writes a new commit with that as its tree.

The last git checkout, then, is done while HEAD resolves to commit #3, and moves from there to commit #1. Comparing the trees, the change from #3 to #1 is "delete two.txt, delete three.txt, maybe replace one.txt". The checkout has to verify that no work will be clobbered (it won't), and then it does that.

Note that if you shuffle things around "behind git's back", by changing the commit to which HEAD points without altering the index (git symbolic-ref HEAD refs/heads/differentbranch or git update-ref HEAD HEAD~ for instance), you can get some interesting effects. Here's what I did for an example:

$ git commit -m '#3'
[detached HEAD e3465c4] #3
 1 file changed, 1 insertion(+)
 create mode 100644 three.txt
$ git tag #3  # since we're "detached", let's save #3
$ git status
$ git status
HEAD detached at #3
$ git update-ref HEAD '#2'
$ git status
HEAD detached from #3
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

    new file:   three.txt

$ git update-ref HEAD HEAD^
$ git status
HEAD detached from #3
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

    new file:   three.txt
    new file:   two.txt

The update-ref operation does not touch the index, but it does change the commit (and hence the tree) to which HEAD points. Thus, comparing index vs current-commit-tree gives different results.

Git: How does git decide which files to delete during checkouts?

2 Answers2