2

In the git documentation of the blame command it says that (emphasis mine):

C[< num >]

In addition to -M, detect lines moved or copied from other files that were modified in the same commit. This is useful when you reorganize your program and move code around across files. When this option is given twice, the command additionally looks for copies from other files in the commit that creates the file. When this option is given three times, the command additionally looks for copies from other files in any commit. [...]

First of all, what does "other files in the commit that creates the file" mean? Does it look for files that were modified in the same commit in which the file was added? Or does it look for all the files that are simply present in the tree of the commit in which the file was added?

I tried an example in which I create a non-empty file (named source) and an empty file (name dest) in the same commit. In the following commit I do something irrelevant. In the third and last commit I copy a few lines from source to dest. The result is shown below (the content is rather large as to try and not confuse git's algorithm, I apologize for that):

$ git init
$ cat source
--incremental         Show blame entries as we find them, incrementally
-b                    Show blank SHA-1 for boundary commits (Default: off)
--root                Do not treat root commits as boundaries (Default: off)
--show-stats          Show work cost statistics
--score-debug         Show output score for blame entries
-f, --show-name       Show original filename (Default: auto)
-n, --show-number     Show original linenumber (Default: off)
-p, --porcelain       Show in a format designed for machine consumption
--line-porcelain      Show porcelain format with per-line commit information
-c                    Use the same output mode as git-annotate (Default: off)
-t                    Show raw timestamp (Default: off)
-l                    Show long commit SHA1 (Default: off)
-s                    Suppress author name and timestamp (Default: off)
-e, --show-email      Show author email instead of name (Default: off)
-w                    Ignore whitespace differences
--minimal             Spend extra cycles to find better match
-S <file>             Use revisions from <file> instead of calling git-rev-list
--contents <file>     Use <file>'s contents as the final image
-C[<score>]           Find line copies within and across files
-M[<score>]           Find line movements within and across files
-L <n,m>              Process only line range n,m, counting from 1
--abbrev[=<n>]        use <n> digits to display SHA-1s

$ cat dest
first initial line in dest
second initial line in dest

$ git add source dest
$ git commit -m "Add source and dest files"
$ touch new-file
$ git add new-file
$ git commit -m "Add irrelevant file"
$ (copy some lines from source to dest)
$ cat dest
first initial line in dest
--show-stats          Show work cost statistics
--score-debug         Show output score for blame entries
-f, --show-name       Show original filename (Default: auto)
-n, --show-number     Show original linenumber (Default: off)
-p, --porcelain       Show in a format designed for machine consumption
--line-porcelain      Show porcelain format with per-line commit information
-c                    Use the same output mode as git-annotate (Default: off)
-t                    Show raw timestamp (Default: off)
-l                    Show long commit SHA1 (Default: off)
-s                    Suppress author name and timestamp (Default: off)
-e, --show-email      Show author email instead of name (Default: off)
-w                    Ignore whitespace differences
--minimal             Spend extra cycles to find better match
-S <file>             Use revisions from <file> instead of calling git-rev-list
--contents <file>     Use <file>'s contents as the final image
second initial line in dest

$ git add dest
$ git commit -m "Copy lines from source to dest"
$ git log --pretty=oneline
6b0f18daaf83ec83d3f53b4a43f4188de3ce87e6 copy lines from source to dest
f1d66ad3dacb8e589747ed02b42d9135081b3704 Add irrelevant file
2b8275dc73ffd88d7adb6f90a2050ef14088019a Add source and dest files

$ git blame dest
^2b8275d -  1) first initial line in dest
6b0f18da -  2)     --show-stats          Show work cost statistics
6b0f18da -  3)     --score-debug         Show output score for blame entries
6b0f18da -  4)     -f, --show-name       Show original filename (Default: auto)
6b0f18da -  5)     -n, --show-number     Show original linenumber (Default: off)
6b0f18da -  6)     -p, --porcelain       Show in a format designed for machine consumption
6b0f18da -  7)     --line-porcelain      Show porcelain format with per-line commit information
6b0f18da -  8)     -c                    Use the same output mode as git-annotate (Default: off)
6b0f18da -  9)     -t                    Show raw timestamp (Default: off)
6b0f18da - 10)     -l                    Show long commit SHA1 (Default: off)
6b0f18da - 11)     -s                    Suppress author name and timestamp (Default: off)
6b0f18da - 12)     -e, --show-email      Show author email instead of name (Default: off)
6b0f18da - 13)     -w                    Ignore whitespace differences
6b0f18da - 14)     --minimal             Spend extra cycles to find better match
6b0f18da - 15)     -S <file>             Use revisions from <file> instead of calling git-rev-list
6b0f18da - 16)     --contents <file>     Use <file>'s contents as the final image
^2b8275d - 17) second initial line in dest
$ git blame -C dest
^2b8275d -  1) first initial line in dest
6b0f18da -  2)     --show-stats          Show work cost statistics
6b0f18da -  3)     --score-debug         Show output score for blame entries
6b0f18da -  4)     -f, --show-name       Show original filename (Default: auto)
6b0f18da -  5)     -n, --show-number     Show original linenumber (Default: off)
6b0f18da -  6)     -p, --porcelain       Show in a format designed for machine consumption
6b0f18da -  7)     --line-porcelain      Show porcelain format with per-line commit information
6b0f18da -  8)     -c                    Use the same output mode as git-annotate (Default: off)
6b0f18da -  9)     -t                    Show raw timestamp (Default: off)
6b0f18da - 10)     -l                    Show long commit SHA1 (Default: off)
6b0f18da - 11)     -s                    Suppress author name and timestamp (Default: off)
6b0f18da - 12)     -e, --show-email      Show author email instead of name (Default: off)
6b0f18da - 13)     -w                    Ignore whitespace differences
6b0f18da - 14)     --minimal             Spend extra cycles to find better match
6b0f18da - 15)     -S <file>             Use revisions from <file> instead of calling git-rev-list
6b0f18da - 16)     --contents <file>     Use <file>'s contents as the final image
^2b8275d - 17) second initial line in dest
$ git blame -C -C dest
^2b8275d -  1) first initial line in dest
6b0f18da -  2)     --show-stats          Show work cost statistics
6b0f18da -  3)     --score-debug         Show output score for blame entries
6b0f18da -  4)     -f, --show-name       Show original filename (Default: auto)
6b0f18da -  5)     -n, --show-number     Show original linenumber (Default: off)
6b0f18da -  6)     -p, --porcelain       Show in a format designed for machine consumption
6b0f18da -  7)     --line-porcelain      Show porcelain format with per-line commit information
6b0f18da -  8)     -c                    Use the same output mode as git-annotate (Default: off)
6b0f18da -  9)     -t                    Show raw timestamp (Default: off)
6b0f18da - 10)     -l                    Show long commit SHA1 (Default: off)
6b0f18da - 11)     -s                    Suppress author name and timestamp (Default: off)
6b0f18da - 12)     -e, --show-email      Show author email instead of name (Default: off)
6b0f18da - 13)     -w                    Ignore whitespace differences
6b0f18da - 14)     --minimal             Spend extra cycles to find better match
6b0f18da - 15)     -S <file>             Use revisions from <file> instead of calling git-rev-list
6b0f18da - 16)     --contents <file>     Use <file>'s contents as the final image
^2b8275d - 17) second initial line in dest
$ git blame -C -C -C dest
^2b8275d dest   -  1) first initial line in dest
^2b8275d source -  2)     --show-stats          Show work cost statistics
^2b8275d source -  3)     --score-debug         Show output score for blame entries
^2b8275d source -  4)     -f, --show-name       Show original filename (Default: auto)
^2b8275d source -  5)     -n, --show-number     Show original linenumber (Default: off)
^2b8275d source -  6)     -p, --porcelain       Show in a format designed for machine consumption
^2b8275d source -  7)     --line-porcelain      Show porcelain format with per-line commit information
^2b8275d source -  8)     -c                    Use the same output mode as git-annotate (Default: off)
^2b8275d source -  9)     -t                    Show raw timestamp (Default: off)
^2b8275d source - 10)     -l                    Show long commit SHA1 (Default: off)
^2b8275d source - 11)     -s                    Suppress author name and timestamp (Default: off)
^2b8275d source - 12)     -e, --show-email      Show author email instead of name (Default: off)
^2b8275d source - 13)     -w                    Ignore whitespace differences
^2b8275d source - 14)     --minimal             Spend extra cycles to find better match
^2b8275d source - 15)     -S <file>             Use revisions from <file> instead of calling git-rev-list
^2b8275d source - 16)     --contents <file>     Use <file>'s contents as the final image
^2b8275d dest   - 17) second initial line in dest

As it can be seen, git blame -C -C dest does not realize that the new lines in dest originate from source, a file that is created with those lines in the same commit as dest. However, git blame -C -C -C dest gives the expected output.

Am I doing something wrong?

Thank you.

EDIT:

I believe

[...] When this option is given twice, the command additionally looks for copies from other files in the commit that creates the file. [...]

means that git will look for line copies (from other files) that happened only in the commit that actually created the file, not that it will look for line copies from other files (files that were present in the commit that created the file). This misunderstanding is related to my first question.

Community
  • 1
  • 1
user42768
  • 1,951
  • 11
  • 22
  • Re your edit: yes, that's what `-C` means: check other files in the commit that creates a new file, to see if some part(s) of the new files were copied from existing files. – torek May 16 '18 at 21:17

1 Answers1

0

Seems to be an assumption on git's part that you can't copy from something that doesn't exist. If you first create the source file in a previous commit, then -C -C should work. If you also modify it in the current commit, only -C will also work.

If git didn't work this way, they would have to concern themselves with the order in which the changes were applied within a commit, which might make the code a bit more hairy (in your example: which file were the lines copied from and to? How would git know since they were both created in the same commit?).

To answer your first question based on your own experimental results:

When this option is given twice, the command additionally looks for copies from other files in the commit that creates the file

Means any files that existed in the tree before the commit that creates the file.

Whereas -C -C -C checks all commits and so also checks the commit that created the file.

DylanYoung
  • 2,423
  • 27
  • 30