2

I have a monorepo with many different packages in it. I'd like to take a package and create a new repo for it. I would like to preserve commits pertaining to the files within the project folder.

I've tried using git filter-branch with the --subdirectory-filter which can create a branch with the files from the folder specified.

git filter-branch --subdirectory-filter ./src/project 

git log --pretty=oneline
dc1fd15e212c2e916591d931f81d0f94b0312067 (HEAD -> master) j
a68625cffd59e0bc6efda226042b81979985f77c a1
612e28ee9a2a7e59e8da0dff2c0109f0c03fa216 a2
04335730f1e4ecfbd7882fcdc16d5f2402261b3f a3
3e76c4deecd9b019302bcdf667523dbc866479bd a4
01ca27e1801b1f80d7480110c2861d798b0e6893 a5

However with the log command and the --all flag you can see that there is many more commits then just seen in the master branch.

git log --pretty=oneline --all
f96e38e502b6bcab013f4505f40b0060fcafc461 a
06d7edc961690a9e2ec406d64dcee57443441926 b
06723ca315e8f2aad83d4d7e3bfcb095456329c6 c
f7f3ff04c873ba573602e8ed24b88394284d1e7a d
9d355e1802fdb6a44199d7f798d4dfc59ab73244 e
00d3e0d524d7f44bd41ea0e12583e234b7db941c f
97a245266a28d75baffa43cc8fc19a8adcf89fe3 g
faa1c96d7789db06b216be5d989f8d809d626a32 h
a5e892fcb2eb4f05c5cb4f7f0adbf3dfe5bd29b5 i
dc1fd15e212c2e916591d931f81d0f94b0312067 (HEAD -> master) j
68583660582d9184f2ea29e9c74da00793775245 k
a68625cffd59e0bc6efda226042b81979985f77c l
8a224dcebdfc18b13001c600113104f20ea0334e m
4c1606853902e1e3711892e5515e51fee45dcfa2 n
4ecd26965d13266a51f807af3ea88382d5cf8ba8 o
612e28ee9a2a7e59e8da0dff2c0109f0c03fa216 p
31cf0b36a977fa7d2d16fb66465fd59f8a9e274a q
dcc685af381a520795bfafe8df667aeb2cf087e6 r
a2d39ff1e21541e0a638b39877ab19fee2c162d0 s
9a5775e3bc3194bb021a0687140f976f6751cee1 t
2b9535729525537811745456f8917e200effb44f u
f4813470ee84b740e398ed09a68121982f6d171e v
4d46ecbe227d191fc7be0260ff6ddb7c2bd6d759 w
aeb866d98e78fed9aef3e9cc721b21b9d051438c x
e1ac5f9505f75290833a28c6b27cff659f734b24 y
9b3db9df98df1a20da1b8dda1200c2d09603ee2d z
def782319c775c68698666eb5f3fe828a70bf7a6 h
006d44e9f1abb7bba7cb9bdc1dcba09b41a3297d i
1e7eb9b30a6402914344c5ef038076a26bdd4a65 j
2e55da67d626db00043eeddbe1204afd3e7e5790 k
65e868a89ff22412f939ffc71fea8c6cef016683 l
aadc31aa541da8fe0f25af5aec9a967f6c5172f1 m
ed865f5499640eac12cd0b0048a54224898cf998 n
7c51acdc3e57f3c28fdcc9ea5c3c368b29991d9b o
06770a789e66dddc1e87257a4f64ca42e9fed6cf p
8148c5cdaa920900eac1750c2453ead6446b2d08 q

Each one of these commits have files and data that do not pertain the the desired subdirectory.

How do I delete these files? This is where bfg came in.

Using the -p (--protect-blobs-from) flag I can pass in all the refs in master.

bfg -D '*' -p "`git rev-list master | tr '\n' ','`"

This removes all the files that are not in the master branch from the entire repository history.

However! If you git log --all you still see all of the extra commits, with no files in the commit.

I am looking for a way to provide a directory to a command (e.g. ./src/project) and have a repo that has only files and commits pertaining to the files in that directory.

How can I remove these commits with no file changes?

There is a PR for bfg for a --prune-empty command. Which could help but is not a viable solution until it's merged in, and I'd seemingly still need to use both filter-branch and bfg.

I've also tried to clone this branch. Ideally there would be a way to take master and create an entirely new repository from it with only those refs.

Update:

I've extended @BlythMeister's post, there's one issue with the filter-branch command in the article the subdirectory-filter will move the contents of the subdirectory to the head. So I have to run it after.

DIR=./src/project
git filter-branch --index-filter 'git rm --cached -qr --ignore-unmatch -- . && git reset -q $GIT_COMMIT -- $DIR' --prune-empty -- --all
git filter-branch --subdirectory-filter $DIR
ThomasReggi
  • 55,053
  • 85
  • 237
  • 424

2 Answers2

0

A while ago I had this exact issue where I wanted to pull a bit of one repo out complete with history to split one repo into 2.

I didn't find a single good guide, but worked out how to do it and wrote my own guide on my blog here: https://www.blyth.me.uk/2017/06/07/migrating-multiple-folders-between-git-repositories/

I hope this helps!

BlythMeister
  • 299
  • 1
  • 12
  • Thanks a bunch for the article! This `filter-branch` does work. However I still has some empty commits, and because I am running two `git filter-branch` (see my appended question) my history now is duplicated. – ThomasReggi Apr 26 '18 at 17:58
  • Why do you need to run it twice? – BlythMeister Apr 26 '18 at 18:19
  • As you can see in my update, I need the my package is in `./src/package` I'd like the contents of that dir to be at the root of the repo. Your script preserves the dir but at it's current location. – ThomasReggi Apr 26 '18 at 18:26
  • Ahh, right. That's an easy fix. Once you have got everything cleaned up, you can just move the directory in a clean commit – BlythMeister Apr 26 '18 at 18:27
  • Git will know that bits and move and the history will all still tally up. – BlythMeister Apr 26 '18 at 18:28
  • I will be running this script regularly, in fact all commits to this submodule will be generated this way (in a CI/CD pipeline). I'd rather preserve history as well as I can. – ThomasReggi Apr 26 '18 at 18:29
  • As part of the linking repository step, if you move the folder to the location you require and commit, you will keep the history in tact. You will just have 1 extra commit for the move – BlythMeister Apr 26 '18 at 18:32
  • I see, yeah I could not do that extra `filter-branch` but your method still has empty commits and tags from the source that I don't want. See my method, it is quicker then yours and what you see in the original filter-branch is what you get. – ThomasReggi Apr 26 '18 at 18:35
  • I've never seen it have empty commits when I ran it...I've done it about a dozen times. – BlythMeister Apr 26 '18 at 18:37
0

Here's a method I put together that is pretty fast. It uses--subdirectory-filter and then a a clone / merge to move get only the contents and commits from a given branch not the entire commit history.

#!/usr/bin/env bash
# clone the original monorepo
git clone $REPO example
# change directory into monorepo
cd example
# remove origin for safety reasons
git remote remove origin
# filter the package / dir you want to be the new submodule / repo
git filter-branch --subdirectory-filter $SUBDIR
# change directory up one 
cd ../
# create a new directory
mkdir example-subdir
# change directory into the new directory
cd example-subdir
# git initialize
git init
# add the filtered monorepo as a remote 
git remote add source ../example
# fetch the source repo
git fetch source
# merge the master branch into this master branch
git merge source/master

Now you have a repo with just the changes / commits from that specific directory.

ThomasReggi
  • 55,053
  • 85
  • 237
  • 424