Git pull doesn't work and shows local files as modified with no modifications

Question

I am trying to perform a git pull from outside the directory, and I am running this command:

git --git-dir=$WORKDIR/sources/.git pull

The output is "Alrady Up To Date" or it just "pulls" the modifications but the files are NOT the ones on the remote, even though the git pull output shows it is supposed to be, and git status shows that the files are "modified", as if he keeps the local version after git pull and tells me I modified it.

How may I fix this problem? I've learned about the --work-tree option but I don't know if it's affecting in some way.

Seems to be working now, can you provide any insight on why this option is needed? — Rafael Moreira, May 06 '21 at 13:51

score 2 · Accepted Answer · answered May 06 '21 at 21:36

I am trying to perform a git pull from outside the directory ...

Don't do that. (Why are you doing that?)

git --git-dir=$WORKDIR/sources/.git pull

In a comment, ElpieKay suggests:

Add --work-tree=$WORKDIR/sources

to which you reply:

Seems to be working now

When you use these two options together, Git will:

relocate itself to your working tree, $WORKDIR/sources;
use $WORKDIR/sources/.git as the repository proper; and
run the Git command there.

This pair of options is—aside from overriding any environment $GIT_WORK_TREE and $GIT_DIR variables—the same as doing:

git -C $WORKDIR/sources pull

where -C causes Git to change directories first, before running the git pull. This would be a more typical way to run the git pull from this location: the variant with separate --git-dir and --work-tree options lets you separate the .git from the working tree, in the relatively rare case where that's useful, and overrides any earlier environment variable settings.¹

¹Git itself sets these same environment variables when you use these options. So:

git --git-dir=A --work-tree=B whatever

works exactly the same as:

GIT_DIR=A GIT_WORK_TREE=B git whatever

except that the latter form assumes a POSIX-style shell (command line interpreter).

Further reading (optional)

can you provide any insight on why this option is needed?

A Git repository is really just the stuff in the .git directory. A repository consists primarily of two databases:

One—usually much larger—consists of Git's commits and other internal objects. These are numbered, and retrieved, by their hash IDs. Commit hash IDs in particular are always unique, and to retrieve a commit, Git needs to know its number.
The other database consists of names, such as branch and tag names, each of which maps to one hash ID number. This allows you to use a branch or tag name to retrieve a commit. Adding new commits usually involves updating a branch name, or a remote-tracking name, so as to be able to find the new commits.

Each commit contains a full snapshot of every file, as a sort of read-only archive. Git compresses these snapshots (in multiple ways) and de-duplicates otherwise-duplicated files within the snapshots, so this takes remarkably little space in many cases. This compression technique is not perfect—it tends to fail in the presence of many large, pre-compressed binary files—but it's very good for human-readable files, such as source code for software.

Commits themselves also contain the commit number—or numbers, plural, for merge commits—of their immediate predecessor commit(s). Thus, as long as Git can find the last commits (via branch or other names), Git can use those to find every earlier commit, one step at a time, through backwards-looking chains. This is the history in the repository: History = commits, starting at the end and working backwards.

But there is a catch. These read-only, compressed commits/files are only usable by Git itself. The main thing you can do with these is exchange them with some other Git. You can also use them with git diff or similar, and do some analysis of how the project has changed over time, and so on. But you can't make any progress. You cannot get new work done, with just the repository made up of the two database.

To get work done, you will need a working tree. Here, you will have Git extract a commit. The extracted commit has the archived (compressed, read-only, Git-ified, useless) files expanded out into their normal everyday (useful) form. We call this a checked out branch or commit. (When and whether we call it a "branch" vs a "commit" is the source of a lot of confusion, because humans are inconsistent here, and sometimes ignorant of the fine details as well. The tricky part is that when we have a branch checked out, we also have a commit checked out. When we have only a commit checked out—via what Git calls a "detached HEAD"—we still have a commit checked out, but not a branch.)

A non-bare repository is defined as a repository that has a working tree. A bare repository is one that lacks a working tree. That's basically all there is to this, but the lack of a working tree means that this repository can receive new commits unconditionally. A Git repository that has a working tree cannot safely receive new commits for the checked out branch. So server (hosting) sites generally store bare repositories.

The git pull command means: First, run git fetch, then run a second Git command to do something with new commits obtained by the git fetch step. The fetch step works fine in a bare repository, but the second command—regardless of which one you pick for git pull to run—needs a working tree.

When you run Git commands, if you have a working tree, you are expected to run them within the working tree. If there is some good reason that you want to run git from outside that working tree, you can use git -C working tree path, as shown above. You can also use $GIT_WORK_TREE or the --work-tree argument.

Furthermore, when you do have a working tree and are not using all these complicated methods to separate the repository proper from the working tree, Git expects a .git directory (or file) to exist in the top level. In fact, in the absence of all these fancy separate-the-two-parts tricks, this is how Git finds the top level of the working tree. Let's say you are in:

/path/to/some/working/dir/ectory

Git will look to see if .git exists in this path. If not, Git will take the ectory part off and try again: is there a /path/to/some/working/dir/.git? If not, Git will take the dir part off and try again: is there a /path/to/some/working/.git? If so, Git has found the top level of your working tree, and the .git here—whether it is a file, containing the location of the .git directory, or directory itself and thus being the .git directory—determines where the repository itself resides.

In your case, though, you ran:

git --git-dir=$WORKDIR/sources/.git ...

This tells Git: the Git directory is (whatever $WORKDIR expands to—this expansion is done by the shell, not by Git)/sources/.git. So Git does not have to search for the top level. You did not tell Git where the top level of the working tree was, so it just assumed that your current directory was the top level of the working tree. But in fact, your current directory was something else. Git may thus have damaged various files on the theory that they were from your working tree.

You may be partially rescued by the fact that Git also stores something Git calls its index (or staging area or cache, depending on which part of Git is doing this "calling"). This is actually just a file inside the repository directory, .git/index for instance. (The exact location of the file can vary and sometimes there are additional files, so don't count too much on this one path. Remember .git/index though if it helps to have a concrete model for what "the index" is.) In this index, Git stores information about which files it has checked out. The presence of the index, plus the assumption that you're already in the top level of your working tree, is why git --git-dir=<path> pull is not behaving correctly.

Git pull doesn't work and shows local files as modified with no modifications

1 Answers1

Further reading (optional)