2

For certain statistical purpuses I need to precisely count the amount of characters contributed by a developer to the current state of a git repo (HEAD).

The closest I could get was this command:

wc -m `git log --no-merges --author="SomeDev" --name-only --pretty=format:"" | sort -u`

There are some problems with this approach:

  1. even if several people contributed to a file - all the characters will be attributed to the initial author,
  2. it fails in cases when a file authored by SomeDev was renamed at some stage.

Can git blame be used somehow for this purpose? I see that it can track file renames on one hand, but it seems to attribute the whole line to the last committer even though he might have changed only a few characters in it, while the rest were contributed by previous committer to that line.

user1876484
  • 610
  • 6
  • 16
  • my 2 cents : git does not store enough information to give the exact contributor of each character in each file within the repo, do know that the final solution will only be "good enough". For example : commits can be rebased, cherry-picked or amended, and have an Author field different from the Committer field, and you won't know who wrote what part of the commit. Likewise : git does not track the history of individual files, so "renaming" is just guessing after the facts. Copy/paste a file and edit 1 line, for example, will list the new file as completely created by the author. – LeGEC Feb 17 '21 at 14:16
  • @LeGEC: let's assume only merge is used. Maybe the amount of chars contributed by a developer to a line can be reconstructed using diff to previous commits (+ git blame). – user1876484 Feb 17 '21 at 14:19

1 Answers1

1

note : as I said in my comment, there is no exact way to establish the author of each individual character.


You would have to look at the diff for each individual file, and compute what author wrote what character.

You can get the list of commits that touched a single file :

git log --format="%h" -- that/file

and work backwards on that list.

You may also ask git log to directly output the list of diffs on a file :

git log -p -- that/file

# you can add options for 'git diff', like '-U0' to discard context lines :
git log -U0 -p -- that/file

# and the 'format' to customize the data displayed on each commit :
git log --format="commit: %h%nauthor: %an" ...

As far as I see, you would have to parse those diffs to establish the "author" of each individual character.

LeGEC
  • 46,477
  • 5
  • 57
  • 104