Difference between git filter-branch subdirectory-filter & git subtree split -P

Question

I'm trying to understand the difference between git subtree split & git filter-branch for a particular use case. This question is similar to Difference between git filter branch and git subtree?, but not exactly the same.

Given a repo with /sub/folder, execute these commands:

git checkout master
git checkout -b subtree-branch-1
git filter-branch --subdirectory-filter sub/folder

You end up with a branch that has just the commits that apply to /sub/folder. Now let's start again:

git checkout master
git subtree split -P sub/folder -b subtree-branch-2
git checkout subtree-branch-2

To me, it looks like I end up with the exact same. TortoiseGit's Revision Graph looks the same:

...the logs for these two branches look the same, & the working directory looks the same. I've found a number of questions/posts that try to explain the difference between filter-branch & subtree split, but based on the above, I'm just not seeing it. And if they do really yield identical results, what was the point of introducing subtree split -P? Is it basically just an alias for filter-branch --subdirectory-filter?

`filter-branch` is designed to replace (some or all of) the commits in your repository by whacking on (some or all of) the references (branch and tag names and other such names). `git subtree` was developed as a less-dangerous, more-useful, more-constrained method of doing one well-defined job that doesn't require whacking on everything. So they *are* very similar, but `git subtree` is not deprecated and `git filter-branch` is (to be replaced with `git filter-repo`). — torek, Apr 10 '20 at 00:28
So basically, in the above situation, the result *does* come out to be identical - but git filter is more broad in what it can do. So git subtree was developed more or less to handle this one specific scenario (above) without doing unnecessary work (which is required to provide filter-branch's extra possible uses, but that isn't applicable to the above usage)? Something like that? — J23, Apr 10 '20 at 02:27
Yes, more or less like that - also `git subtree` now handles more cases than `git filter-branch`, as it has all the add/push/merge stuff in it. — torek, Apr 10 '20 at 04:57
The history of this is that `git filter-branch` existed first, some people started using it to do what `git subtree split` does, and then some subset of the group of people using it like that, plus the overall Git folks, came up with `git subtree`. — torek, Apr 10 '20 at 04:59
Gotcha. So in summary: in my use case above there is no difference. It's just that I happened upon a use case that's in the overlap of the two, but they also both have non-overlapping capabilities too. (Envisioning a Venn diagram). Wanna go ahead & post an answer so I can accept? :) — J23, Apr 10 '20 at 05:13

score 3 · Accepted Answer · answered Apr 11 '20 at 01:03

The short part is just as you put it in your comment summary:

... in my use case above there is no difference.

However, the general advice would be to use git subtree: it's more direct, less error-prone, and should continue working even if-and-when git filter-branch stops working someday.

More details

Essentially, you've independently re-discovered how the git subtree command came into existence: various users wanted to take their existing repository, extract some part of it—typically a library—and export that as a new Git repository. The git filter-branch command could do that, so that is what they did.

This was popular enough, and useful enough, that git filter-branch grew a filter named --subdirectory-filter dedicated to the job. See commit 685ef546b62d063c72b401cd38b83a879301aac4 by Johannes Schindelin, in 2007, first released in Git version 1.5.3.

Subtree splitting itself was not really enough (and git filter-branch is a kind of dangerously-overpowered tool), so in April 2009, Avery Pennarun introduced the git subtree command, starting with commit 0ca71b3737cbb26fbf037aa15b3f58735785e6e3, as an experimental script for the 1.7.11 release candidate. This implemented split and, almost immediately, --rejoin and add as well. The first actual installed version—rather than add-on distributed with Git—is from commit 0d31de303f9e8e28cc1649dbf41c1cc635bae2d8 by Ben Walton. All this was released in Git 1.7.11.

Recently, git filter-branch itself has been formally deprecated: it's difficult to use correctly, slow, and just generally not very nice. A new git filter-repo command is faster and more useful, though it does require that you know enough Python to use it, and it's still not distributed with Git itself. But git subtree, which has a specific purpose rather than a general one, will stick around, and should have the same usage in the future even if git filter-repo actually replaces git filter-branch entirely.

Difference between git filter-branch subdirectory-filter & git subtree split -P

1 Answers1

More details