1

I used git checkout <commit_SHA> to visit an earlier commit in the git tree. Git showed me the following message:

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

Does this mean that any change that I make here (even by committing them) will not be kept when I return to the most recent commit (using git checkout master)?

amiref
  • 3,181
  • 7
  • 38
  • 62

2 Answers2

3

The changes in a detached HEAD state will be kept until the Git garbage collection (GC) deletes it.

The documentation on Detached HEAD is pretty good:

It is important to realize that at this point nothing refers to commit f. Eventually commit f (and by extension commit e) will be deleted by the routine Git garbage collection process, unless we create a reference before that happens.

That means, you can either checkout / switch to another referenced branch. (e.g. git switch main) or you create a new branch to keep the current detached worktree with git switch -c newbranch or git checkout -b newbranchname.

When you create a new branch, the worktree is out of the detached HEAD state and has a reference on it. (the new branch name)

This answer describes how you can disable the automatic garbage collection, if you want to.

kapsiR
  • 2,720
  • 28
  • 36
3

Git finds commits by their hash IDs. The hash IDs are those big ugly strings of letters and digits, such as e1cfff676549cdcd702cbac105468723ef2722f4. These look random, but aren't.

If you write down the hash IDs of each of your commits, you can get them back, for a while at least. But what if you make a mistake or typo while copying down these hash IDs? It would be better to have the computer save them.

That's what a branch name does. In fact, though, it only saves one hash ID. So that's all you'd really need to write down. Every time you make a new commit in "detached HEAD" state, you would have to write down the hash ID of the new commit you just made. You could erase the hash ID of any previous commit (though you don't have to).

Here's how it all works. Every commit saves two things:

  • Each commit stores a full snapshot of every file (that Git knows about at the time you, or whoever, make the commit). These files are stored in a special, compressed, read-only, Git-only format, with the files being de-duplicated, so that if a new commit re-uses most of the files from an old commit, they don't actually take any space.

  • And, each commit stores some metadata: information such as your name and email address, and some date-and-time-stamps. In this metadata, Git stores the hash ID of the previous commit, that comes just before the new commit you just made.

So, if we have a chain of commits, all in a row, we can draw them like this:

... <-F <-G <-H

where H stands in for the actual hash ID of the last of these commits. Git can yank commit H back out of its big database-of-all-commits,1 using the hash ID. That gets Git the saved snapshot, plus the metadata. The metadata stores the raw hash ID of earlier commit G.

Git can use this to yank commit G back out of its database, which gets a different saved snapshot, and the metadata for G ... which includes the hash ID of earlier commit F. So now Git can grab F, which has a snapshot and metadata. This goes on and on: Git works backwards, from the last commit to the first.

But you, or someone or something anyway, has to get Git this last hash ID. That's where a branch name is useful: a branch name, by definition, stores the last hash ID in the chain. If you:

git checkout somebranch

(or use git switch to do the same) you get something that we might draw like this:

...--F--G--H   <-- somebranch (HEAD)

The special name HEAD remembers which name you told Git to use. The name holds hash ID H. If you make a new commit now, Git will write out a new commit, which gets a new random-looking (but unique and not actually random at all) hash ID, which we'll call I. Git then writes I's hash ID into the name somebranch:

...--F--G--H--I   <-- somebranch (HEAD)

So that's how Git remembers which commit is the last one. It's in the branch name!


1This big database actually holds all of Git's internal objects. Commits are just one of four types of objects. A Git repository is basically two databases: this big one, and a smaller—well, usually smaller—one that maps names, like branch names, to hash IDs. The smaller database lets you find the hash IDs, and the big database holds the commits.


Detached HEAD mode

In detached HEAD mode, you tell Git: Don't store a name in the special name HEAD, store a raw hash ID instead. For instance, let's say you decide to look at historic commit G:

...--F--G   <-- HEAD
         \
          H--I   <-- somebranch

You can now look around at the files that came out of commit G. If you make a new commit now, Git stores the new commit as usual: it gets some big ugly hash ID, unique to it, but we'll call it J:

          J   <-- HEAD
         /
...--F--G
         \
          H--I   <-- somebranch

Now suppose you git checkout somebranch again, to get back to this:

          J   ???
         /
...--F--G
         \
          H--I   <-- somebranch (HEAD)

The name HEAD now holds the name somebranch, rather than the actual hash ID of commit J. How will you find commit J?

Reflogs

If you wrote the hash ID down, that's one way to find it. Git will hang on to commit J for at least 30 days by default, and you can look up the hash ID and type it in again. That's ... painful, at best.

Git also saves the hash ID for you in what Git calls a reflog. The reflogs are also kind of painful to use. Run git reflog any time, and Git will show you what's in the HEAD reflog. The hash IDs are the true names of each commit that HEAD pointed to,2 whether directly (detached HEAD) or indirectly (through a branch name), in the last 30 or more days. But typically there are hundreds of these, and finding a useful one in the maze of twisty little hash IDs, all alike is no fun.


2These are abbreviated for display. They also have numbered names, such as HEAD@{3} or HEAD@{14}. The number increments every time Git adds a reflog entry, while the hash ID—abbreviated or full—stays the same, always.


So what should you do?

If you don't care about finding your commits again later, just keep working in detached-HEAD mode. If you do care about finding them later, create a new branch name. Branch names are super-cheap: they just hold one of those big ugly hash IDs.

Use git branch newname to create the new branch name newname wherever you are right now. Then use git checkout or git switch to switch to it, so that HEAD is attached to that name. Or, combine these two steps: git checkout -b newname or git switch -c newname means create the name, then check it out / switch to it, all at once.

torek
  • 448,244
  • 59
  • 642
  • 775
  • thank you, it was a great and comprehensive answer. I just have a follow-up question. What if a commit is created when we merge two or more branches. Then the branch_name and HEAD would point to the SHA of the new commit, but does this most recent commit points to both its parents or just one of them? – amiref Oct 14 '20 at 10:08
  • @amiref: yes (both): If it's an actual merge commit (as made by `git merge`, but note that `git merge --squash` *doesn't* make a merge commit), then it has two or more parents: that's the definition of a merge commit. The first parent is the usual parent of any commit, and the second is the other commit that participated in the merging action. – torek Oct 14 '20 at 23:04