A work file is modified after it was added to staging. Then the file is committed to git without and add. What should happen?

Question

I noticed that after a modified file is added to staging in git and the file is changed again then followed by a commit without an add, there is no error nor warning. The latest changes in the working file get committed. Is what initially added to staging thrown out?

$ git init
Initialized empty Git repository in /tmp/test/.git/

/tmp/test (master)
$ git config --global user.name "Your Name"

/tmp/test (master)
$ git config --global user.email "you@example.com"

/tmp/test (master)
$ echo A > my.txt

/tmp/test (master)
$ git add my.txt

/tmp/test (master)
$ git commit -m '1st' my.txt

[master (root-commit) c804a96] 1st
 1 file changed, 1 insertion(+)
 create mode 100644 my.txt

at this point my.txt was committed with 'A'

/tmp/test (master)
$ echo B >> my.txt

/tmp/test (master)
$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   my.txt

no changes added to commit (use "git add" and/or "git commit -a")


/tmp/test (master)
$ git diff

The file will have its original line endings in your working directory
diff --git a/my.txt b/my.txt
index f70f10e..35d242b 100644
--- a/my.txt
+++ b/my.txt
@@ -1 +1,2 @@
 A
+B

/tmp/test (master)
$ git add my.txt

at this point work file has additional 'B' and was added to staging

/tmp/test (master)
$ git status
On branch master
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        modified:   my.txt

/tmp/test (master)
$ git diff

/tmp/test (master)
$ git diff --cached
diff --git a/my.txt b/my.txt
index f70f10e..35d242b 100644
--- a/my.txt
+++ b/my.txt
@@ -1 +1,2 @@
 A
+B

/tmp/test (master)
$ git diff HEAD
diff --git a/my.txt b/my.txt
index f70f10e..35d242b 100644
--- a/my.txt
+++ b/my.txt
@@ -1 +1,2 @@
 A
+B

/tmp/test (master)
$ echo C >> my.txt

at this point 'C' was added to the work file but not added to staging

/tmp/test (master)
$ git status
On branch master
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        modified:   my.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   my.txt


/tmp/test (master)
$ git diff --cached
diff --git a/my.txt b/my.txt
index f70f10e..35d242b 100644
--- a/my.txt
+++ b/my.txt
@@ -1 +1,2 @@
 A
+B

/tmp/test (master)
$ git diff
diff --git a/my.txt b/my.txt
index 35d242b..b1e6722 100644
--- a/my.txt
+++ b/my.txt
@@ -1,2 +1,3 @@
 A
 B
+C


/tmp/test (master)
$ git commit -m '2nd' my.txt
[master 4f574dc] 2nd
 1 file changed, 2 insertions(+)

at this point commit was done without an 'add'

/tmp/test (master)
$ git status
On branch master
nothing to commit, working tree clean

/tmp/test (master)
$ git diff

/tmp/test (master)
$ git diff --staged

/tmp/test (master)
$ git diff HEAD

/tmp/test (master)
$ cat my.txt
A
B
C

score 5 · Answer 1 · answered Feb 12 '21 at 10:18

From the docs at https://git-scm.com/docs/git-commit :

by listing files as arguments to the commit command (without --interactive or --patch switch), in which case the commit will ignore changes staged in the index, and instead record the current content of the listed files (which must already be known to Git);

(Emphasize mine — phd)

torek · Answer 2 · 2021-02-12T20:16:49.713

phd's answer is correct but deserves some expansion.

If you look at the current documentation for git commit (this evolves over time) it should include the --include and --only options:

-i
--include

Before making a commit out of staged contents so far, stage the contents of paths given on the command line as well. This is usually not what you want unless you are concluding a conflicted merge.

-o
--only

Make a commit by taking the updated working tree contents of the paths specified on the command line, disregarding any contents that have been staged for other paths. This is the default mode of operation of git commit if any paths are given on the command line, in which case this option can be omitted. [snip]

As that last-quoted sentence says, the default action, when adding path names to your git commit command, is to behave as git commit --only. This particular action is achieved in a remarkably complex fashion, which can confound some pre-commit hooks.

The --include behavior is easier to describe, though this easy/simple description is slightly flawed (see below for a thorough and correct description). Using git commit --include with:

$ git add file1.txt
$ git commit --include file2.txt

for instance is essentially equivalent to doing:

$ git add file1.txt
$ git add file2.txt
$ git commit

That is, the --include simply runs git add for you, though with the complication that if the commit fails, these files are magically "un-added".

The --only option, however, is much more complicated. There's no simple way to describe it that is also fully correct. To describe both of these properly, we need to understand Git's index in some detail.

Technical details: the index

When Git makes a new commit, it always¹ does so from an index or staging area or cache. These are three words for the same thing. The index / staging-area / cache is the way Git keeps track of what you would like committed. Except for the special case of a conflicted merge,² the index holds your proposed next commit. When you first git checkout or git switch to some commit, Git fills in its index from that commit. So your proposed next commit matches your current commit.

You may have noticed here that I sometimes say the index, or Git's index, as if there is exactly one index, but I also sometimes say an index, as if there can be more than one. The tricky part here is that both are correct: there is one particular distinguished index—the index—but you can have more than one.

Technically, the distinguished index is per-work-tree: if you use git worktree add, you not only add another working tree, but also another distinguished index, for that particular working tree. You can find the file name of the distinguished index with:

git rev-parse --git-path index

which normally prints .git/index, but in an added work-tree, prints something else. If $GIT_INDEX_FILE is set in the environment, it prints this variable's value. This is how Git swaps to some alternate index file—or more precisely, it's the externally available mechanism that you can use, to point Git to some alternate index file, and a way for a pre-commit hook to detect a git commit --only invocation, for instance.

When you run git add, Git finds the existing entry, in the index, for the file you're git add-ing:

If there is no existing entry—if this is a new file—Git copies the file into Git's index and now there is an existing entry: your proposed new commit has a newly added file, as compared to the current commit.
Otherwise, there is some existing file in Git's index. Git boots this file out of its index, and copies the work-tree version of the file into its index. If this copy of the file is different from the copy in the current commit, git status will now say that the file is staged for commit.

So, git add simply updates your proposed next commit, which—at all times (but see footnote 2)—holds a copy of every file that Git will snapshot. The copy that's in the index is the one git commit will use.

Now that we know how the index works, and that Git can use some extra, temporary index files that we can create, now we can really see how git commit --include and git commit --only work.

¹This is correct for git commit, but if you use git commit-tree you can bypass the need for the index. You must supply, to git commit-tree, the hash ID of the tree. Where will you get that tree? If you use git write-tree, that uses the index. But you can get a tree from somewhere else, by, e.g., just using some existing tree, or using git mktree. Note, however, that with git mktree you can build incorrect trees; the resulting commit will be impossible to check out.

²During a conflicted merge, Git expands the index. This expanded index cannot be written out: git write-tree complains and aborts. Using git add or git rm, you replace the expanded index entries with normal entries, or remove some entries entirely. Once there are no expanded, non-zero-stage entries left, the conflicts are all resolved, because git write-tree can now write out the index: committing becomes possible again.

Technical details: `--include` and `--only`

To implement git commit --include, Git does this, more or less:

copies the index to a temporary one ("an" index);
runs git add on the temporary index, with the files you're include-ing;
attempts the commit.

An attempted commit can succeed—creating a new commit and updating the current branch name—or it can fail. The commit fails, for instance, if git commit runs your editor and then you choose to delete the entire commit message. Perhaps you were looking at something and realized you shouldn't commit yet, so you did that. Or, the commit fails if the pre-commit hook decides that this commit is not ready yet. Note that the pre-commit hook should look at the temporary index here! It should not look at the files in your working tree. That's not necessarily what will be in the commit. Your proposed next commit is now whatever is in the temporary index.

If the commit fails, Git simply removes the temporary index. The original index—the index—is untouched, so everything is now back the way it was. The git adds in step 2 are magically undone.

If the commit succeeds, Git simply replaces the index with the temporary index. Now the index and the current commit—which is the one we just made—match, so that nothing is "staged for commit". That's how we like it.

Implementing git commit --only is harder. There are still two cases: the commit can fail, or the commit can succeed. For the "fail" case, we want to have the same thing happen as for git commit --include: the index, the main distinguished one, is undisturbed, as if we didn't even attempt to run git commit. But, for the success case, git commit --only is tricky (and the documentation is, I think, slightly inadequate).

Suppose we do this:

$ git checkout somebranch         # extract a commit that has files
$ echo new file > newfile.txt     # create an all-new file
$ git add newfile.txt             # stage the all-new file (copy into index)
$ echo mod 1 >> file.txt          # append a line to an existing file
$ git add file.txt                # stage the updated file (copy into index)
$ echo mod 2 >> file.txt          # append *another* line to the file
$ git commit --only file.txt -m "test"

What would we like as the outcome, if this succeeds? We told Git to commit the two-line addition. Our working tree copy of the file is the two-added-lines version. Should the staged file, proposed for next commit after our test commit, have just the one added line? Or should it have both added lines?

Git's answer to this question is that it should have both added lines. That is, if the git commit works, git status should now say nothing about file.txt; it should only say that newfile.txt is a new file. The two-added-lines version of the file must therefore be the one in the proposed next commit, at this point. (You might agree with Git, or disagree with it, but that's what the Git authors chose to have as the result.)

What this means is that we need three versions of the index at the point of git commit --only attempting to make the commit:³

One—the original index—will have the new file in it, and the one added line.
One—the index to be used by git commit to make the new commit—will not have the new file in it, but will have the two added lines to file.txt.
The last one will have the new file in it, and the two added lines to file.txt in it.

The middle one of these three is the one git commit will use when attempting to make the new commit. That has the two added lines, but not the new file: it's the git commit --only action, in action.

If the commit fails, git commit simply removes both of the temporary index files, leaving the original index—the index—undisturbed. We now have one added line in the proposed next commit's version of file.txt, and we have the newly added file in the proposed next commit as well, as if we never ran git commit --only file.txt at all.

If the commit succeeds, git commit makes the last index—which has both the newly added file, and the two-added-lines version of file.txt—become the (main / distinguished) index. The original index and the temporary index used for doing the commit both get removed.

This is what makes git commit --only so complicated. Suppose you're writing a pre-commit hook yourself, and in this pre-commit hook, you plan to do two things:

Use a linter to make sure that there are no obvious bugs in any of the code that is to be committed (pylint, pep8, go vet, etc.).
Use a formatter to make sure that the code conforms to the project's standard (black, go fmt, etc.).

(In my opinion, step 2 is a mistake: don't do it. But others like the idea.)

We now have three cases:

You're doing a normal git commit. $GIT_INDEX_FILE is not set. There's just one index to worry about. You read the files out of the (normal, everyday, standard) index, into a temporary directory, and lint them there. If the linting fails, you stop and reject the commit. If the linting succeeds, you format the files and git add them back to the (single) index, and let the commit happen.

There's still a big problem here because the files that just got committed are the ones that were staged, not the ones in the user's working tree. You can, perhaps, check the working tree files against the pre-updated, not-yet-formatted ones in the index, before git adding any formatting updates. If the working tree files match the index copies, it might be safe to reformat the working tree copies here too.
You're doing a git commit --include. There are two index files to worry about, but for linting purposes, you simply read the ones out of the index file that Git is using now for this commit, which is in $GIT_INDEX_FILE (which generally names .git/index.lock at this point).⁴

You can treat this as before, because any formatting you do will go into the proposed commit, and it's just as safe to wreck the user's working tree files as last time. You've already rejected the commit (and not done any formatting, presumably) if you're going to reject the commit; and if the commit succeeds, as you think it will, the user's --include files should be formatted too, after all. On success, any updates you make to the temporary index will be in the real index, because the temporary index becomes the real index.
You're doing a git commit --only. There are now three index files to worry about. One of them—the one git commit is going to use—is in $GIT_INDEX_FILE. One of them—the one git commit plans to use to replace the main / distinguished index is in a file whose name you don't know. The third one, the one that Git will drop back to on failure, is the standard main index.

You can do your checking as usual: lint / vet the files that are in $GIT_INDEX_FILE. That's the commit the user is proposing to make, after all.

But now, if you format those files and add them to $GIT_INDEX_FILE ... well, the formatted files are the ones that will get committed. That's all well and good. But you also need to git add those formatted files to the temporary index file whose name you don't know! And, when it comes to checking the working tree files against some index copies, you probably should use the copies that are in that temporary index file whose name you don't know.

If you don't change any files, but simply lint / vet them all and check for the desired formatting, these problems go away. So it's best to just check stuff. If the user wants the working tree files formatted according to the project's rules, provide them with a working-tree-files-formatter. Let them run that, and then let them run git add on the updated files (or, if you really must, offer to add back the formatted files in the formatting script).

I've worked with a project where the pre-commit hook did check and then, if the formatting was wrong, checked $GIT_INDEX_FILE and would stop and do nothing for the tough cases, or offer to git add the reformatted files. That too is an option, but it's a little bit risky since it's possible that Git will change some behavior, making the $GIT_INDEX_FILE test fail somehow.

³There are no doubt other ways to achieve the desired result, but given the fact that the index file is actually a file (at least initially), plus a bunch of existing Git code, this three-index-files trick was the one to use.

⁴This was the case the last time I tested all this, but that was quite a while ago—before the existence of git worktree add, which will clearly affect this, at least for added work-trees.

score -1 · Answer 3 · answered Feb 14 '21 at 00:52

Thank you torek for providing more than enough context to clear up the confusion. There is more on it in what"s the difference between git commit <file> and git commit --only?

The moral of the story is -- Do not specify files when committing unless you know what you are doing.

Typical action:

 git commit -m "commit everything that is currently staged"

Atypical action:

git commit -m "commit only my.txt working file, ignore what is staged" my.txt

Specifying a file in the above commit implies an --only option

git commit --only -m "commit only my.txt working file, ignore what is staged" my.txt

This --only option says to commit only the file(s) specified in the commit command and ignore what is currently being staged. After the commit, stages files will remain as they were, waiting to be committed. I suppose there is use case for such a 'jumping the queue' behavior. But allowing implicit use of --only option is foolhardy. It is an accident waiting to happen.

Here is an analogy that can help remember this. You and your friends arrive at a restaurant and are waiting (staged) to be seated together at a large table. The maitre d realizes the 2 people in your group of 12 are actually not part of your group and prioritizes specifically them (--only) to be seated first at table for 2. Instead of saying 'the next party to be seated follow me', the maitre d says 'you two follow me'.

As it has already been pointed out, the --include commit option is always explicit and says to also add a work file(s) specified on the command to whatever else currently already staged.

git commit -m "add almostforgot.txt to what is staged " --include almostforgot.txt

Is equivalent to:

git add almostforgot.txt
git commit -m "now have everything"

A work file is modified after it was added to staging. Then the file is committed to git without and add. What should happen?

3 Answers3

Technical details: the index

Technical details: --include and --only

Technical details: `--include` and `--only`