0

After playing around with rm etc etc, my git reflog no longer shows the past commits.

However, git log --reflog is still able to show them.

How does git log --reflog show dangling commits if it doesn't rely on git reflog?

Pacerier
  • 86,231
  • 106
  • 366
  • 634
  • When you say *my reflog*, how many reflogs are you looking at? Just the one for `HEAD`? Just the one for `master`? All of them? – torek May 02 '20 at 11:48
  • @torek, `reflog` is now empty while it used to have quite a number of lines. – Pacerier May 02 '20 at 11:50
  • Let me try a different way of phrasing that: How are you viewing the reflog(s)? Specifically, *what commands are you using?* And: **there is more than one reflog.** – torek May 02 '20 at 11:50
  • @torek, `git reflog`, output is empty. `git log --reflog` shows the past commits. – Pacerier May 02 '20 at 11:51

1 Answers1

3

TL;DR

Reflogs do not hold commit ancestry: reflogs hold name-update history. It's the commit graph that holds commit ancestry. git log would like some set of starting points, but after getting them, it looks at the commit graph. git log --reflog normally alters the set of starting points, but if you've removed all the reflogs (or never had any, e.g., in a bare clone), it doesn't: you get the standard single starting point, HEAD.

Note that reflog entries eventually expire, after 90 days by default,* but the graph never expires. A new clone has no reflog history, but has all the graph linkage—and git log --reflog still shows multiple commits.


*The default expiry is 30 days for some entries, 90 for most, and for the special reflog for refs/stash, never. The deetails are out of scope of this answer.


Long

Experimentally:

rm .git/logs/HEAD

removes the HEAD reflog, but git reflog still shows some data. On the other hand:

rm -r .git/logs

removes all reflogs after which git reflog shows nothing.

At this point, one might expect git log --reflog not to find anything. However, apparently this uses the same "add HEAD itself as a default" behavior. So now that there are no reflogs, git log --reflog does the equivalent of:

git log HEAD

which shows the current commit and its ancestors. Using:

git log --no-walk --reflog

you'll see just the one commit identified by HEAD.

This means that the answer to:

How does log --reflog work if it doesn't rely on reflog?

is that it does not do anything that plain git log does not do any more:

  • When you supply particular starting commits to git log, Git shows those commits, and commits reachable from those commits, without using HEAD as a starting point. (Adding --no-walk makes this clearer.)

  • When you don't supply any particular starting commit, git log uses HEAD as its starting point. (Again, adding --no-walk makes this clearer.)

(When you do have some reflogs—which is the normal case—the --reflog argument supplies the reflog values as starting points, which disables the "use HEAD as starting point" action. If everything now makes sense, you can stop here!)

A potential source of confusion

It's important, when using Git, to know what a commit does for you, vs what a branch name like master, or a reflog entry like master@{3}, does for you.

Each Git commit holds a full snapshot of all of your files, but that's not all that it holds. Each commit also holds some metadata. Much of this metadata—information about the commit—is pretty obvious as it shows up in git log output. This includes the name of whoever made the commit, an email address, and a date-and-time-stamp, along with the log message they provided.

Each commit itself has a unique hash ID, too. This hash ID is, in essence, the "true name" of the commit. It's how Git looks up the actual commit object, in its big database of all commits and other supporting Git objects.

A branch name like master simply holds the hash ID of one particular commit. This one commit is, by definition, the last commit in the branch. But a commit, such as the last one in the master branch, also can hold a commit hash ID. Each commit, in its metadata, has a list of hash IDs. These are the parents of the commit.

Most commits have just one parent hash ID. This forms these commits into simple backwards-looking chains. We can draw such a chain like this:

... <-F <-G <-H   <-- master

if we use uppercase letters to stand in for commit hash IDs. Here H is the hash ID of the last commit on master. Commit H itself contains, in its metadata, the actual hash ID of earlier commit G. So given commit H, Git can use this hash ID to look up commit G. That in turn supplies the hash ID of commit F.

Git can, in effect, walk this chain backwards. That's what git log does, normally. Using --no-walk tells git log: show me commits, but do not walk backwards through their chains; show me only the commits I specifically select via the command line. So with --no-walk, you will see only the commits you selected, and not their ancestry.

Reflogs, like branch names, hold hash IDs. The reflogs are organized into one log per name (branch name, tag name, and so on) plus one log for the special name HEAD. These are, at least currently, stored in plain files in the .git/logs directory. Each log has entries—one line per file, in this case—and each entry corresponds to the hash ID that the name resolved-to at some earlier time. You can use these to access the earlier values, so master@{1} tells Git to use the one-step-earlier value: before the most recent update to the name master, it resolved to some hash ID; now it resolves to some (probably different) hash ID; we want the one from one step back. The name master@{2} tells Git that we want the value from two steps back.

Note that these are name-update steps, not commit-backwards-arrow steps. Sometimes master@{1} is the same as master~1, master@{2} is the same as master~2, and so on—but sometimes these are different. The suffix syntax master~2 or master^2 operates with / on the commit graph. The suffix syntax master@{number} operates with / on the reflog for master.

(The current value of the branch name, master@{0}, isn't in the master reflog because it is in master itself. Updating master will take the current value and add it to the log, and then set the new value.)

You can have Git spill out the contents of some or all reflogs using git reflog. If there are no reflogs at all—which will be the case if you remove them all—nothing will come out here as there are no longer any saved values. However, all the names still have their values, and HEAD still exists and contains a branch name such as master.

Even more detail

Note that the way git log functions, it can only really show one commit at a time. To handle this, it uses a priority queue. You can, for instance, run:

git log <hash1> <hash2> <hash3>

using three actual hashes, or:

git log master develop feature/tall

which uses names to find hash IDs, or:

git log master master@{1} master@{2}

which uses two reflog entries (plus the branch name) to find hash IDs.

In all cases, Git inserts all the hash IDs into a priority queue.

Using --reflog as a command-line argument tells git log to take all the values from the reflogs and insert those into the queue.

If nothing goes into the queue, Git inserts the result of resolving HEAD to a hash ID instead.

At this point, the queue is presumably not empty, because if nothing else, we got a hash ID by resolving the name HEAD.1

The git log command now enters a loop, which runs until the queue is empty. This loop works as follows:

  • Take the highest priority commit off the queue.
  • Use any selection-type arguments supplied to git log to decide whether to display this commit. If so, display the commit. (For instance, git log --grep selects for display commits whose log message contains the given string or pattern.)
  • If --no-walk is in effect, we're done with this commit. Otherwise, choose some or all of the parents of this commit to put into the queue, based on the --first-parent flag and any History Simplification selected.

(Note that if a commit is now, or ever has been, in the queue, git log won't put it back into the queue, so you will not see the same commit twice. The priority within the queue is affected by git log's sorting options.)

So, with --reflog, we give git log multiple starting points from the reflog entries, if there are reflogs. If there aren't any reflogs, git log uses its standard default: start with HEAD.

Regardless of whether we used --reflog or not, git log now walks commits using the parent linkage in the commits themselves. This does not depend on the arguments we supplied, except of course for --no-walk.2


1If there are no commits at all, or we're on an "unborn branch" created by git checkout --orphan, the queue would be empty at this point, but git log will have errored out while trying to resolve the name HEAD.

2Also, with the -g or --walk-reflogs argument, git log will not walk the commit graph. Instead, it walks the reflog entries.

The difference between --walk-reflogs and --reflog is that with --walk-reflogs, the whole priority queue thing is tossed out entirely: Git looks only at the reflogs. This also changes some output formats. In fact, git reflog really just runs git log -g.

torek
  • 448,244
  • 59
  • 642
  • 775
  • re *"`git log HEAD` which shows the current commit and its ancestors"*; But it doesn't show descendents. **How does `git log --reflog` show descendents when `git reflog`'s output is empty?** – Pacerier May 02 '20 at 12:14
  • I expanded the answer a lot, but the TL;DR is: reflogs do not hold commit ancestry; reflogs hold name-update history. It's the *commit graph* that holds commit ancestry. Note, too, that reflog entries eventually expire, after 90 days by default. The graph never expires. A new clone has no reflog history, but has all the graph linkage. – torek May 02 '20 at 12:56
  • I'd honestly put the TL;DR version at the top of your answer – Daemon Painter May 02 '20 at 13:04
  • @DaemonPainter: good idea, done. – torek May 02 '20 at 13:26