This is a long question. I'm trying to reverse-engineer some basic Git functionalities, and am having some trouble wrapping my head around what git add
really does under the hood. I'm already familiar with the three trees of Git, and that the index file is not really a tree but rather a sorted-array representation of the tree.
My original hypothesis is as follows: when git add <pathspec>
is run,
- If
<pathspec>
exists in working directory:- Create an index file from that reflects state of in working directory
- Overwrite the relevant section of index file with this (sub-)index.
- If
<pathspec>
exists only in current index file:- This means has been deleted in working directory, so...
- Delete relevant section of index file that corresponds to .
- If
<pathspec>
does not exist in working directory or index file:fatal: pathspec <...> did not match any files
This hypothesis reflects a "do what you're told to do" git add
, that only looks at the path and registers changes at or under this path to the index file. For most cases, this is how the actual git add
seems to work.
But there are some cases that don't seem very straightforward:
1. Replacing a file with a directory
git init
touch somefile
git add . && git commit
rm somefile
mkdir somefile && touch somefile/file
At this point, the index file consists of only a single entry for the somefile
file I just deleted, as expected. Now I execute git add
. I have two ways of doing this: git add somefile
or git add somefile/file
. (Obviously I'm excluding the trivial git add .
here)
What I expected:
git add somefile
: equivalent togit add .
- remove old entry and add new entrygit add somefile/file
: only add an index entry for the newsomefile/file
.
What actually happens: Either of the above commands directly lead to the final state of having a single index entry for somefile/file
- ie, both are equivalent to git add .
.
Here, it feels like git add
is not your straightforward "do what you're told to do" command. git add somefile/file
seems to peek in and around the provided path, realizes somefile
is no longer there and automatically removes the index entry.
2. Replacing a directory with a file
git init
mkdir somefile && touch somefile/file
git add . && git commit
rm -r somefile && touch somefile
At this point, the index file contains a single entry for the old somefile/file
as expected. Again, I execute git add
in the same two variants.
What I expected:
git add somefile/file
: Normally, remove entry for the oldsomefile/file
. But if it peeks around, it should also add new entry forsomefile
.git add somefile
: equivalent togit add .
.
What actually happens:
git add somefile/file
: leads to an empty index file - so, it does what I normally expect it to do!git add somefile
: equivalent togit add .
, as expected
Here, git add
behaves as a "do what you're told to do" command. It only picks up the paths and overwrites the appropriate section of index file with what the working directory reflects. git add somefile/file
does not poke around and thus does not automatically add an index entry for somefile
.
3. Inconsistent index file
Up to this point, a possible theory could be that git add
tries to avoid the case of an inconsistent index file - ie, an index file that does not represent a valid work tree. But one extra level of nesting leads to exactly that.
git init
touch file1
git add . && git commit
rm file1 && mkdir file1 && mkdir file1/subdir
touch file1/subdir/something
git add file1/subdir/something
This is similar to case 1, only that the directory here has an extra level of nesting. At this point, the index file consists only of an entry for the old file1
as expected. Again, now we run git add
but with three variants: git add file1
, git add file1/subdir
and git add file1/subdir/something
.
What I expected:
git add file1
: Equivalent togit add .
, leads to single index entry forfile1/subdir/something
.git add file1/subdir
andgit add file1/subdir/something
: Normally, should only add an entry forfile1/subdir/something
(leading to inconsistent index file). But if the above "no-inconsistent-index" theory is correct, these should also remove the oldfile1
index entry, thus being equivalent togit add .
.
What actually happens:
git add file1
: Works as expected, equivalent togit add .
.git add file1/subdir
andgit add file1/subdir/something
: Only add a single entry forfile1/subdir/something
, leading to an inconsistent index file that cannot be committed.
The inconsistent index file I'm referring to is:
100644 <object addr> 0 file1
100644 <object addr> 0 file1/subdir/something
So just adding another level of nesting seems to stop git add
from peeking around as it did in case 1! Note that the path provided to git add
didn't matter too - both file1/subdir
and file1/subdir/something
lead to inconsistent index file.
The above cases paint a very complicated implementation of git add
. Am I missing something here, or is git add
really not as simple as it seems?