Git finds commits by their hash IDs. The hash IDs are those big ugly strings of letters and digits, such as e1cfff676549cdcd702cbac105468723ef2722f4
. These look random, but aren't.
If you write down the hash IDs of each of your commits, you can get them back, for a while at least. But what if you make a mistake or typo while copying down these hash IDs? It would be better to have the computer save them.
That's what a branch name does. In fact, though, it only saves one hash ID. So that's all you'd really need to write down. Every time you make a new commit in "detached HEAD" state, you would have to write down the hash ID of the new commit you just made. You could erase the hash ID of any previous commit (though you don't have to).
Here's how it all works. Every commit saves two things:
Each commit stores a full snapshot of every file (that Git knows about at the time you, or whoever, make the commit). These files are stored in a special, compressed, read-only, Git-only format, with the files being de-duplicated, so that if a new commit re-uses most of the files from an old commit, they don't actually take any space.
And, each commit stores some metadata: information such as your name and email address, and some date-and-time-stamps. In this metadata, Git stores the hash ID of the previous commit, that comes just before the new commit you just made.
So, if we have a chain of commits, all in a row, we can draw them like this:
... <-F <-G <-H
where H
stands in for the actual hash ID of the last of these commits. Git can yank commit H
back out of its big database-of-all-commits,1 using the hash ID. That gets Git the saved snapshot, plus the metadata. The metadata stores the raw hash ID of earlier commit G
.
Git can use this to yank commit G
back out of its database, which gets a different saved snapshot, and the metadata for G
... which includes the hash ID of earlier commit F
. So now Git can grab F
, which has a snapshot and metadata. This goes on and on: Git works backwards, from the last commit to the first.
But you, or someone or something anyway, has to get Git this last hash ID. That's where a branch name is useful: a branch name, by definition, stores the last hash ID in the chain. If you:
git checkout somebranch
(or use git switch
to do the same) you get something that we might draw like this:
...--F--G--H <-- somebranch (HEAD)
The special name HEAD
remembers which name you told Git to use. The name holds hash ID H
. If you make a new commit now, Git will write out a new commit, which gets a new random-looking (but unique and not actually random at all) hash ID, which we'll call I
. Git then writes I
's hash ID into the name somebranch
:
...--F--G--H--I <-- somebranch (HEAD)
So that's how Git remembers which commit is the last one. It's in the branch name!
1This big database actually holds all of Git's internal objects. Commits are just one of four types of objects. A Git repository is basically two databases: this big one, and a smaller—well, usually smaller—one that maps names, like branch names, to hash IDs. The smaller database lets you find the hash IDs, and the big database holds the commits.
Detached HEAD mode
In detached HEAD mode, you tell Git: Don't store a name in the special name HEAD
, store a raw hash ID instead. For instance, let's say you decide to look at historic commit G
:
...--F--G <-- HEAD
\
H--I <-- somebranch
You can now look around at the files that came out of commit G
. If you make a new commit now, Git stores the new commit as usual: it gets some big ugly hash ID, unique to it, but we'll call it J
:
J <-- HEAD
/
...--F--G
\
H--I <-- somebranch
Now suppose you git checkout somebranch
again, to get back to this:
J ???
/
...--F--G
\
H--I <-- somebranch (HEAD)
The name HEAD
now holds the name somebranch
, rather than the actual hash ID of commit J
. How will you find commit J
?
Reflogs
If you wrote the hash ID down, that's one way to find it. Git will hang on to commit J
for at least 30 days by default, and you can look up the hash ID and type it in again. That's ... painful, at best.
Git also saves the hash ID for you in what Git calls a reflog. The reflogs are also kind of painful to use. Run git reflog
any time, and Git will show you what's in the HEAD
reflog. The hash IDs are the true names of each commit that HEAD
pointed to,2 whether directly (detached HEAD) or indirectly (through a branch name), in the last 30 or more days. But typically there are hundreds of these, and finding a useful one in the maze of twisty little hash IDs, all alike is no fun.
2These are abbreviated for display. They also have numbered names, such as HEAD@{3}
or HEAD@{14}
. The number increments every time Git adds a reflog entry, while the hash ID—abbreviated or full—stays the same, always.
So what should you do?
If you don't care about finding your commits again later, just keep working in detached-HEAD mode. If you do care about finding them later, create a new branch name. Branch names are super-cheap: they just hold one of those big ugly hash IDs.
Use git branch newname
to create the new branch name newname
wherever you are right now. Then use git checkout
or git switch
to switch to it, so that HEAD
is attached to that name. Or, combine these two steps: git checkout -b newname
or git switch -c newname
means create the name, then check it out / switch to it, all at once.