3

I've always been sceptical about 'squashing' or rewriting history, and now I'm just about convinced that I should ban this practice from the repos I manage. But maybe somebody can answer this, and save the day for my squash-championing colleagues.

I just came across a line of code that I want to understand the reasoning behind, so I used git annotate to find out who wrote it and why. But it points to a 'squashed' commit, a long list of commit message headers about a myriad of features and bug-fixes, without detail. Not really helpful.

'Oh, but there's always the reflog,' I've been assured; 'information never actually gets lost in git!' Okay, glad to hear it! But I tried git reflog <squashed-commit-hash>, and got no output at all—not helpful. I also tried git rev-list <squashed-commit-hash>, and got a list of hundreds of hashes, and after manually inspecting a few of them with git show, I've concluded they're not parts of what got squashed—also not helpful.

So is it actually possible to find out which single commit contributed to that line of code, and see that commit's entire message? Can it be done in a single git command, without having to be a bash guru? Or has this information, in fact, actually been lost in git after all?

Michael Scheper
  • 6,514
  • 7
  • 63
  • 76
  • 1
    "I'm just about convinced that I should ban this practice from the repos I manage." How do you propose to do that, exactly? Git is a distributed version control system; what I do on my own system before I share my commits with you isn't any of your business (nor can you tell that I've done it). I guess you could do it by policy but in my opinion such a policy would be pointless. – ChrisGPT was on strike Aug 14 '18 at 16:36
  • 3
    "But it points to a 'squashed' commit, a long list of commit message headers, without detail. Not really helpful." The person who squashed the original commits should generally ensure that the commit message for the new commit _is_ helpful. This may involve editing the old commit messages, replacing them, or adding some kind of summary. Git actually prompts the user for the new commit message during a squash, providing the old messages as a starting point. Unhelpful commit messages from squashing can be caught in code review, just like regular commit messages and code changes. – ChrisGPT was on strike Aug 14 '18 at 16:41
  • So if, in a code review, we find the commit message is in fact unhelpful, is it possible to unsquash it and do it again? If not, then that adds more weight to my feeling that squashing is too dangerous a practice to allow. And yes, it would be by policy—whether squashing becomes part of our policy is something I've been considering carefully, and I'm afraid the 'but reflogs!' counterargument has collapsed now, in my view. – Michael Scheper Aug 14 '18 at 16:46
  • You don't have to unsquash anything. Amend the commit and you can edit the commit message. Editing history in Git isn't dangerous as long as you fully understand the ramifications. – Makoto Aug 14 '18 at 16:48
  • 2
    I think the "but reflogs!" argument answers the wrong question. Whether I'm submitting original or rewritten commits is irrelevant—in either case my responsibility as the person submitting the work is to ensure that it is correct, clear, passes tests, etc. If it is discovered during review that my work doesn't meet whatever standards exist for the project I need to fix it. Again, you _have no way to know_ whether a given commit is "original" or not. – ChrisGPT was on strike Aug 14 '18 at 16:50
  • @Makoto: Great! So where do I find the original commit messages? – Michael Scheper Aug 14 '18 at 16:50
  • 1
    Well...so long as the commits haven't actually been published, they'd be on the remote. If they are (and I suspect they were), and the original commiter elected to discard them, then they're gone forever (unless they want to dive into the reflogs for them). If you have issues with the final resulting commit message, a `git commit --amend` will let you edit *that* specific commit message. – Makoto Aug 14 '18 at 16:51
  • Commits can't exactly be unsquashed, but the person who squashed them will generally still have access to the original commits, should they need them. In most cases they won't be necessary. By default, when squashing, Git opens up your default editor for a commit message (just like with a brand new commit). But in this case it pre-populates all of the original commit messages. The user can then edit if necessary before committing. If you're seeing a series of commit messages my guess is that the committer _didn't_ modify the default; in that case you'll see the original, complete messages. – ChrisGPT was on strike Aug 14 '18 at 16:52
  • 1
    My gut tells me that you'd benefit more from talking with the people that use this in their everyday workflow so you could get more accustomed to how they work. I use rebase and squash in certain contexts myself but I do what I can to leave helpful messages; the people that aren't that you're working with, you definitely need to talk to them. – Makoto Aug 14 '18 at 16:53
  • It's the 'everday workflow' that I'm in the process of defining. I've picked up managing a rather chaotic project, and it's clear that a lot of people didn't really know what they were doing. 'Neat' commit messages, summarising work in progress, does sound appealing, but it's not worth wiping history, which seems to have happened in this case. `git show` for commits from 2015 were once very helpful in this project, but it seems a single command somebody typed in 2017 has closed that avenue now. ☹ – Michael Scheper Aug 14 '18 at 17:01
  • 1
    Shifting your perspective a bit may help. Instead of focusing on a specific workflow to enforce (and specific behaviours to ban), focus on what a good pull request looks like. Its commit message should have characteristics X, Y, and Z; it should be atomic; it should relate to a ticket in the tracker of your choice… That's where your "quality control" can kick in most effectively. Whether I meet those goals via `rebase` or similar doesn't really matter. (Forbidding rewriting _on shared, long-lived branches_ is a good idea. Some shared repos allow this to be enforced, and provide PR templates.) – ChrisGPT was on strike Aug 14 '18 at 17:11
  • 1
    @Chris: I think my bad experiences with squashing comes from just that, people squashing shared, long-lived branches. And most of my git experience comes from many years of in-house repos, so I guess I haven't realised the benefits of all of GitHub's features, like 'pull requests', either; I suppose there's nothing wrong with people tidying up commit histories within their feature branches for one of those. But scrubbing away helpful information from release branches or master, just because it looks 'neat' or 'simple' to some, still seems like madness to me. I just don't see the benefit. – Michael Scheper Aug 14 '18 at 17:35

3 Answers3

6

Since other answers haven't addressed it, I think this needs to be called out:

'Oh, but there's always the reflog,'

No. 100% false. The reflog is both local and temporary.

It's true that git does a very good job protecting against loss of history (unless you specifically direct it to lose history), but to say that information never gets lost is wildly inaccurate and dangerous. People need to understand the ways information can (and can't) be lost so that they know when and how to rely on the protections git really does offer.

As to the broader issue:

Rewriting shared histories is generally not a good practice. Most people who advocate for aggressive rebasing want to rewrite histories before pushing to the remote. Not only is that a less unpopular practice, it's pretty much impossible to ban even if you want to (because when I push a commit, you don't know whether I created it by squashing several previous commits or not).

Acceptable granularity of the final commits, and acceptable documentation for each commit, are standards you'll need to set with the team. The enforcement will probably end up being more social than technical.

Mark Adelsberger
  • 42,148
  • 4
  • 35
  • 52
  • This answer answers the question most directly, so I've marked it as the 'accepted' answer. But I appreciate everyone's answers and comments; they've largely confirmed my understanding of squashing in git, and helped me realise when it is and isn't appropriate. – Michael Scheper Aug 15 '18 at 15:58
  • Nice answer @MichaelScheper, does this mean that I can't be guaranteed to be able to run `git show ` or even have complete granularity on atomic commits using `git bisect`? – tallamjr Sep 26 '19 at 15:31
  • @tallamjr I'm not 100% sure what you're asking. If you started with commits A, B, C and then squashed them into commit Z, you can `git show ` and see all the changes together. You may not be able to see the individual changesets anymore - i.e. `git show ` won't work if `A` has been cleaned up by gc, or if you're in a repo that never had `A` because it only got `Z`. Similarly bisect will not be able to see "the state of the code after `A` but before `B` and `C`" in the rewritten history – Mark Adelsberger Sep 26 '19 at 17:10
  • Thanks @MarkAdelsberger, that does indeed clarify things for me and helped my understanding overall. I was concerned about not be able to see at an atomic level which commit might have cause trouble if it had subsequently been squash; for example if commit `B` is the culprit I would not be able see that, only that commit `Z` was troublesome. – tallamjr Sep 26 '19 at 17:31
3

Squashing a commit turns many commits into one, giving the commiter the option to maintain the commit message for each individual commit.

If they chose not to do that, then there's no way to discern information about the individual commits which were amalgamated into one.

I should stress:

I've been assured; 'information never actually gets lost in git!'

This pertains only to the files under Git, which is often source code. Naturally, when one is rewriting history, all bets are off.

Since you can see who commited the work, your best bet to get more information about that specific line of code would be to conduct a code review with them on it. If they're no longer working with you, then all you have to go off of is the whole squashed commit.

Makoto
  • 104,088
  • 27
  • 192
  • 230
2

So is it actually possible to find out which single commit contributed to that line of code, and see that commit's entire message?

As far as your project's history is concerned, the 'single commit' IS the squashed commit, so that is what will display as the source of the code change. This is intentional and the practical purpose is to combine several commits into one to keep the git history manageable. Here's an example of when I would squash commits on a project:

I'm working on a medium sized feature. It takes a couple days to complete. I have a tendency to commit pretty regularly as 'work in progress' (WIP) commits and always at the end of the day. My commit messages are often in the scope of the current feature I'm working on, not the project as a whole. Once the feature is complete and I merge into whatever working branch (let's say dev), these messages are difficult to parse in the context of the project, and the diffs between the commits I made on the feature branch don't really matter for maintainability. The diff that matters is before feature vs after feature, so I will combine all my commits into one labeled something like '(Ticket number) - (Feature name)'

Jeremy M
  • 184
  • 5
  • Yeah, it seems the person who squashed the branch abused this feature, because it was actually a release branch, for a number of tickets. But it's the lack of commit detail from the original commits that I'm most concerned about. I advocate a 'subject' line explaining what the feature/bugfix is, and, sometimes, a paragraph explaining the approach, and often, the alternatives considered. This information probably actually belongs in code comments or the ticket, but it's nevertheless helped explain code numerous times before, so I'm rather distressed about it being so hard to find. – Michael Scheper Aug 14 '18 at 16:38