In Git pre-commit hook, temporarily remove all changes that are not about to be commited

Question

I would like my pre-commit hook to compile the program and run all the automatic tests before allowing to perform the commit. The problem is that usually my working copy is not clean while I'm committing. They are not staged or even untracked files that I don't want to commit. Sometimes I even explicitly specify only a few files to commit which has nothing to do with what is currently staged.

Of course I want to compile and test only the changes that will be committed, ignoring the other ones. There would be 3 steps to it:

Remove all changes that won't be committed.
Run the tests.
Restore the all the changes to exactly how they were before the 1st step.

The 1st step could be achieved by running git stash push --include-untracked --keep-index. The stash entry would also help with the 3rd step. However, I don't know what to do when I'm committing explicit list of files that are not staged.

(The 2nd step is not really a part of the question.)

The 3rd step could be theoretically done with command git stash pop --index but this command seems to be prone to conflicts if some file was staged and then changed more without staging it again.

This script creates a repository with some files and changes that cover various corner cases:

#!/usr/bin/env sh

set -e -x

git init test-repo
cd test-repo
git config user.email "you@example.com"
git config user.name "Your Name"

echo foo >old-file-unchanged
echo foo >old-file-changed-staged
echo foo >old-file-changed-unstaged
echo foo >old-file-changed-both
git add .
git commit -m 'previous commit'

echo bar >old-file-changed-staged
echo bar >old-file-changed-both
echo bar >new-file-staged
echo bar >new-file-both
git add .
echo baz >old-file-changed-unstaged
echo baz >old-file-changed-both
echo baz >new-file-both
echo baz >untracked-file

I recommend not doing this at all: look into the `pre-commit` program instead, or consider doing a `git checkout-index` into a temporary directory. All solutions to this problem have their own problems, though, so you're kind of in a pick-your-flaws situation here. — torek, Sep 08 '21 at 23:01
@torek Do you mean the `pre-commit.sample`? Yes, it provides examples how to get the list of cached files but testing individual files is not sufficient for what I'm trying to do. I need to compile everything into a single program to do those tests. As for `git checkout-index`, I'm not entirely clear on what it does. Does it allow to make a clear copy of the to-be-committed state in another directory? If this is the case, I'll pass. There is a lot of ignored `*.o` files in the current directory. If I would need to recompile everything from scratch, it would take significantly too long. — Piotr Siupa, Sep 08 '21 at 23:35
No, not the included sample, the stuff at https://pre-commit.com/ — torek, Sep 09 '21 at 01:11

Piotr Siupa · Accepted Answer · 2021-09-11T12:22:09.383

You were actually quite close to a correct solution.

(In this answer, I'm going to use the word "cache" instead of "stage" because the latter one is too similar to "stash".)

In fact, the trick with using stash would work even if you were to commit files that are not cached. This is because Git changes the cache for the duration of running hooks, so it always contains the correct files. You can check it by adding the command git status to your pre-commit hook.

So you can use git stash push --include-untracked --keep-index.

The problem with conflicts when restoring the stash is also quite easily solvable. You already have all the changes backed up in the stash so there is no risk of loosing anything. Just remove all the current changes and apply the stash to a clean slate.

This can be done in two steps. The command git reset --hard will remove all the tracked files. The command git clean -d --force will remove all untracked files.

After that you can run git stash pop --index without any risk of conflicts.

A simple hook would look like that:

#!/bin/sh

set -e

git stash push --include-untracked --keep-index --quiet --message='Backed up state for the pre-commit hook (if you can see it, something went wrong)'

#TODO Tests go here

git reset --hard --quiet
git clean -d --force --quiet
git stash pop --index --quiet

exit $tests_result

Let's break it down.

set -e ensures that the script stops immediately in case of an error so it won't do any further damage. The stash entry with backup of all changes is done at the beginning so in case of an error you can take manual control and restore everything.

git stash push --include-untracked --keep-index --quiet --message='...' fulfills two purposes. It creates a backup off all current changes and removes all non staged changes from the working directory. The flag --include-untracked makes sure that untracked files are also backed up and removed. The flag --keep-index cancels removal of the cached changes from the working directory (but they are still included in the stash).

#TODO Tests go here is where you tests go. Make sure you don't exit the script here. You still need to restore the stashed changes before doing that. Instead of exiting with an error code, set its value to the variable tests_result.

git reset --hard --quiet removes all the tracked changes from the working directory. The flag --hard makes sure that nothing stays in the cache and all files are deleted.

git clean -d --force --quiet removes all the untracked files from the working directory. The flag -d tells Git to remove directories recursively. The flag --force tells Git you know what you're doing and it is really supposed to do delete all those files.

git stash pop --index --quiet restores all changes saved in the latest stash and removes it. The flag --index tells it to make sure it didn't mixed up which files were cached and which were not.

Disadvantages of this method

This method is only semi-robust and it should be sufficient for simple use cases. However, they are quite a few corner cases that may break something during real-life usage.

git stash push refuses to work with files that were only added with the flag --intent-to-add. I'm not sure why that is and I haven't found a way to fix it. You can bypass the problem by adding the file without the flag or by at least adding it as an empty file and left only the content not cached.

Git tracks only files, not directories. However, the command git clean can remove directories. As the result, the script will remove empty directories (unless they are ignored).

Files that were added to .gitignore since the last commit will be deleted. I consider this a feature but if you want to prevent it, you can by reversing the order of git reset and git clean. Note that this works only if .gitignore is included to the current commit.

git stash push does not create a new stash if there is no changes but it still returns 0. To handle commits without changes such as changing the message using --amend you would need to add some code that checks if stash was really created and pop it only if it was.

Git stash seems to remove the information about current merge, so using this code on a merge commit will break it. To prevent that, you need to backup files .git/MERGE_* and restore them after popping the stash.

A robust solution

I've managed to iron out most of the kinks of this method (making the code much longer in the process).

The only remaining problem is removing empty directories and ignored files (as described above). I don't think these are severe enough issues to take time trying to bypass them. (It is doable, though.)

#!/bin/sh

backup_dir='./pre-commit-hook-backup'
if [ -e "$backup_dir" ]
then
    printf '"%s" already exists!\n' "$backup_dir" 1>&2
    exit 1
fi

intent_to_add_list_file="$backup_dir/intent-to-add"
remove_intent_to_add() {
    git diff --name-only --diff-filter=A | tr '\n' '\0' >"$intent_to_add_list_file"
    xargs -0 -r -- git reset --quiet -- <"$intent_to_add_list_file"
}
readd_intent_to_add() {
    xargs -0 -r -- git add --intent-to-add --force -- <"$intent_to_add_list_file"
}

backup_merge_info() {
    echo 'If you can see this, tests in the `pre-commit` hook went wrong. You need to fix this manually.' >"$backup_dir/README"
    find ./.git -name 'MERGE_*' -exec cp {} "$backup_dir" \;
}
restore_merge_info() {
    find "$backup_dir" -name 'MERGE_*' -exec mv {} ./.git \;
}

create_stash() {
    git stash push --include-untracked --keep-index --quiet --message='Backed up state for the pre-commit hook (if you can see it, something went wrong)'
}
restore_stash() {
    git reset --hard --quiet
    git clean -d --force --quiet
    git stash pop --index --quiet
}

run_tests() (
    set +e
    printf 'TODO: Put your tests here.\n' 1>&2
    echo $?
)

set -e
mkdir "$backup_dir"
remove_intent_to_add
backup_merge_info
create_stash
tests_result=$(run_tests)
restore_stash
restore_merge_info
readd_intent_to_add
rm -r "$backup_dir"
exit "$tests_result"

Nice answer. re "not sure why that is", stash checks whether a stash pop will recreate the stashed index, and there is simply no way to write an intent-to-add entry in a committed tree. There is no representation for it; if there were, it'd be an added entry, not an entry-to-be-added later. The intent-to-add entry is there to show various tools that they should regard a work tree file as "tracked" so the various diffing and patching tools show the change and offer to really add it (before the next commit in the series you're building) as an option. — jthill, Sep 08 '21 at 22:16
@jthill An entry of Git stash is comprised of 2 (or three) commits. One of them is for cached files and it is capable of storing a newly added file. Another one is for not cached files and I can't see a reason why it couldn't be able to store a newly added file which would mean it is intended to be added but not in the cache. Am I missing something here? — Piotr Siupa, Sep 08 '21 at 22:25
Yes, the intent-to-add entries *aren't added yet*. The entire point of that index entry is it *isn't* written to the object db. — jthill, Sep 08 '21 at 22:33
@jthill The problem isn't really how they are represented when they are in the working directory. This already works. The problem is that they are not represented in the stash while they could be easily be just included in the non-cached commit as new files. `git apply` could convert that back to whatever representation the working directory uses. I'm fine with the explanation that it just isn't implemented yet because other things have higher priority or something. However, if you want to suggest there is some technical reason for that, I still don't see it. — Piotr Siupa, Sep 08 '21 at 22:47
I don't think there is a *technical* reason. Note that intent-to-add was completely broken for several years and nobody noticed; it's not heavily used. (But as jthill notes, you'll turn "intent to add" into "actually added" when you commit—there would need to be a way to store the intent flag and reinstate the weird state later. This is obviously do-able, but it needs some additional information: probably a fourth commit in the stash.) — torek, Sep 08 '21 at 22:58
No, you're not getting it. Stash writes ordinary commits of ordinary trees, which by definition do not contain any "intent to add" entries because writing a tree adds it to the object db, not by any arbitrary definittion or authority but by the actual fact of it being written to the object db. The "intent-to-add" entry in the index cannot exist anywhere but the index, not because nobody's "implemented" it but because if it's in a written tree, you've gone beyond "intent to add" that entry to a written tree, you've *actually* added it to a written tree. — jthill, Sep 08 '21 at 22:59
@torek No, you wouldn't ;) The fact, a file is present in the stash commit that represents non-cached changes and not in the stash commit that represents stashed changed, suffices for intend-to-add flag. — Piotr Siupa, Sep 08 '21 at 23:06
Ah, good point. This only applies to the stash commits themselves, but when taken as a pair, the absence in `i` plus presence in `w` could only be explained by the `N` flag, yes. — torek, Sep 08 '21 at 23:08
@jthill OK, let's say the db don't have any GC mechanism and we don't want to litter it with untracked files. When we use the flag `--include-untracked`, we commit the file anyway so maybe it could work at least with this flag? — Piotr Siupa, Sep 08 '21 at 23:09
If you want to implement it and try to sell it, have at. I think it's a waste of time, deep into foolish consistency territory: massive amounts of pain, no value to anyone. As it stands, intent-to-add entries are a special case for in-flight rebuilds, and you're trying to eliminate one special case by introducing an entire host of others. — jthill, Sep 08 '21 at 23:32

In Git pre-commit hook, temporarily remove all changes that are not about to be commited

1 Answers1

A simple hook would look like that:

Disadvantages of this method

A robust solution

Linked