12

I'm using git-subtree to extract a directory from my project.

git subtree split --prefix=src/SubProject --branch=SubProject origin/master

Given this is how I would like to start the project to begin with (specifically no --rejoin) how can I split only the changes from origin/master to SubProject on successive runs?

For smaller projects this has been working fine. I can live with ~5 seconds to split the project. But on larger repositories this can take quite some time. I've seen it take up to five minutes per split. Some of the projects I'd like to work with have upwards of 45 sub projects in one repository.

I've tried many things that I thought might work but each has failed in one way or another. My requirements are:

  • Must not mess with origin/master in any way (so for the most part --rejoin is out of the question)
  • It must not add extra merge commits (think: --ff-only when getting new changes from origin/master)
  • The SubProject repository must have stable commit ID's. Meaning that after one or more incremental updates to SubProject, it should have the same commit ID's in its history that it would get if I re-run the original command at the top of this post.
  • It must be automated and require no manual intervention

I'm not afraid of a complex solution but it needs to be automated. Failure because of history changing is fine; at that point the script can fall back to build the entire thing from scratch. In that case the user knows what they've done so they get to sit through a very long process of rebuilding from scratch. :)

--rejoin

I tried using --rejoin and keeping an extra copy of origin/master around to just be a container for the altered history added by --rejoin. The problem I ran into here was that I was not able to either git rebase origin/master or git merge --ff-only origin/master without intervention. I need to be able to do this in an automated way so this was not acceptable.

merge commits

I was able to get it to work as I wanted with git merge origin/master but it resulted in a merge commit. Since this merge commit will not ever go upstream the history going forward would, I think, be impossible to predict so a fresh git subtree split in a pristine environment would nto be able to reproduce the same exact history. I could be wrong on this. If so, please explain to me how this will be safe. :)

commit ranges

I experimented with using a commit range and I was able to create a new subtree split in SubProject that only contained a list of commits from a certain point in time to HEAD. This would probably work except it looks as though it generates a new set of commit ID's so I do not think this will be an option.

Beau Simensen
  • 4,558
  • 3
  • 38
  • 55
  • Just a thought: can we make a branch first and remove all other directories, and merge things that are done to that particular branch and push? – Hari K T Feb 14 '13 at 01:40
  • Seems like as of now (git 1.8.1) there is a lack for this functionality. The only way to make git "remember" previos splits is with the `git split --rejoin` option. – Maic López Sáenz Feb 21 '13 at 01:12
  • 1
    I could handle `git split --rejoin` if there was a way to get new information from the upstream repository into the branch without doing a merge commit. I think then the history on the subtree split would always be stable and could be recreated? Maybe I'm wrong on that. – Beau Simensen Feb 21 '13 at 20:18
  • Are you still interested in an answer to this? – Chronial Apr 27 '13 at 18:19
  • @Chronial of course! :) – Beau Simensen May 03 '13 at 19:19
  • 1
    `That way, future splits can search only the part of history that has been added since the most recent --rejoin` -> If you could manually specify a pair of commits (one from origin/master, the other from SubProject) to search from/append to (instead of auto-detection using previous --rejoin), whould it be enough for you and OK? – Vi. May 06 '13 at 17:49
  • 1
    @Vi. i think that would probably work, but i'd have to find a way to get the commit of the subproject to use at a later date so that i could cache it / store it for the next run. – Beau Simensen May 07 '13 at 00:47
  • Damn, totally interested in this. Looks like there's still no solid solution in mid-2015. – lkraav Apr 23 '15 at 17:32

1 Answers1

3

Implemented a patch for subtree split to simplify playing with this. Now you can explicitly specify the parent which is filled when there are no other parents:

$ git init
$ for i in {1..100}; do
   echo $i >q
   git add q
   git commit -m $i
   mkdir -p qqq
   echo $i > qqq/w
   git add qqq/w
   git commit -m "qqq/$i"
done

$ # let's do full split
$ /home/vi/src/git/git/contrib/subtree/git-subtree.sh split \
    --prefix=qqq --branch qqq HEAD
...7/200 (6)...58/200 (57)...142/200 (141)...176/200 (175)...
Created branch 'qqq'
f5120d3e676e2966802c8829b13a34c8d0c2dac4

$ # now let's do partial split
$ /home/vi/src/git/git/contrib/subtree/git-subtree.sh split \
    --prefix=qqq --branch qqq2 HEAD~100
...20/100 (19)...
Created branch 'qqq2'
3632fb9fc5c7a7f0b4bf8c6743e2cd372a6d8e52

$ # Now let's "continue the work" on qqq2
$ /home/vi/src/git/git/contrib/subtree/git-subtree.sh split \
   --prefix=qqq --branch qqq2 \
   --graft-parent=3632fb9fc5c7a7f0b4bf8c6743e2cd372a6d8e52 \
   HEAD~100..HEAD
Grafting 3632fb9fc5c7a7f0b4bf8c6743e2cd372a6d8e52\n
...10/100 (9)...
Updated branch 'qqq2'
f5120d3e676e2966802c8829b13a34c8d0c2dac4
Vi.
  • 37,014
  • 18
  • 93
  • 148
  • +1 nice solution, could maybe use two more sentences on how to use it. – Chronial May 07 '13 at 13:26
  • It's not a complete solution yet. Note: in source code of git-subtree I see various caches, "prior"... I don't yet 100% know what is it about. – Vi. May 07 '13 at 15:00
  • @Vi. this looks like an interesting solution. How stable do you think it is? If it works as advertised it might be pretty close to what I'm looking for. :) – Beau Simensen May 08 '13 at 16:54
  • It does one thing: if there are no parents going to be added (i.e. first commit), "graft" your parent (should be explicit 40-digit SHA-1 id) to it. Everything else should work as without the option. Beware about specifying merge commits, maybe you'll need to specify multiple parent commits in this case. In general, test the solution before deploying into production (with or without my patch). – Vi. May 08 '13 at 20:45