0

I need to include some code previously not under version control into a git repository already containing some commits.

What I want to automate is the finding of a suitable "parent" commit which the new code will be a child commit of.

After some testing I think git diff-tree will work best (since only one folder contains the relevant code for the testing of "parentship").

My approach is like this:

  1. loop through all existing commits and node sha1 of commit and sha1 of relevant subtree
  2. copy new files to repository, add them to the index
  3. note sha1 of relevant subtree in the index
  4. compare the existing relevant subtrees with the new candidate and calculate "similarity" by using git diff-tree of something similar
  5. choose the most similar existing subtree and make it's commit the parent of the new commit, i.e. check out the new parent (or checkout --orphan if not suitable parent can be found), empty working directory and fill with the new files and commit.

What's missing is a way to calculate the similarity! Maybe someone can give me a hint which combination of flags will help...

The code looks almost like PASCAL if that's important.

Onur
  • 5,017
  • 5
  • 38
  • 54

1 Answers1

1

Wouldn't git diff --numstat be suitable for you here. You can direct the diff to be between the particular file or path, and the output is 'machine friendly'.

Philip Oakley
  • 13,333
  • 9
  • 48
  • 71
  • I'm currently using something like you proposed: `git diff-tree -r -M80% -C80% --numstat` and extract the number of lines added/deleted. While it works somehow I thought there might be a more sophisticated option. – Onur May 16 '12 at 08:26