2

I'd like to get the number of lines for a given set of files evolving in the history of a git repository. To get it for the current working copy I can call wc -l $FILES, but how to get it with timestamp for each commit to generate a graph of growing and shrinking files?

Jakob
  • 3,570
  • 3
  • 36
  • 49

3 Answers3

0

You can use git rev-list HEAD to get a list of all commit hashes and then do git checkout for each commit and execute wc -l $FILES.

Example:

$ export FILES="CMakeLists.txt README.md TODO.md"
$ git rev-list --reverse HEAD | xargs -I '{}' sh -c 'git checkout {} 2>/dev/null; git log -1 --pretty=%ci; wc -l $FILES 2>/dev/null'

Output:

...
2014-12-24 01:18:34 +0400
     31 CMakeLists.txt
    126 README.md
      3 TODO.md
    160 total
2014-12-24 01:57:36 +0400
     35 CMakeLists.txt
    126 README.md
      3 TODO.md
    164 total
2014-12-24 15:04:10 +0400
     35 CMakeLists.txt
    126 README.md
      2 TODO.md
    163 total
Stas
  • 11,571
  • 9
  • 40
  • 58
0

I would recommend using git log to achieve what you want. Specifically running:

git log --numstat --pretty=format:

This will give you output similar to:

10     8 pom.xml
0      6 other/pom.xml

--numstat will give you the number of insertions (first column) and deletions (second column). Using the above example the pom.xml had 10 insertions and 8 deletions and the other/pom.xml had 0 insertions and 6 deletions. With the empty format you'll only see output of files as they have changed over time. You could mess with that format to give you the sha1 of the commit or the date or whatever is useful for your needs.

git log --numstat --pretty=format:%ad #will give you the date
git log --numstat --pretty=format:%H  #will give you the sha1

You'll need to parse this data a little bit to get what you what but I believe this is the information you are asking for.

Jarred Olson
  • 3,075
  • 1
  • 19
  • 34
0

The approach taken by @Stas is useful, but it checks out each revision and counts the lines in the local git working copy. I took it one step further and fixed it to not touch the working copy at all; I also fixed it to output semicolon-separate output which can easily be imported to Excel or similar.

Note: my version is limited to a single file though. Just run it with different FILE multiple times if you need data for multiple files.

Here's the command:

FILE=README.md; git rev-list --reverse HEAD $FILE | xargs -I '{}' sh -c 'TIME=$(git log -1 --pretty=%ci {}); LENGTH=$(git show {}:$FILE | wc -l); echo $TIME,$LENGTH'

This will give you an output like this:

2019-02-06 13:51:53 +0200,40
2019-02-06 16:01:13 +0200,40
2019-03-14 13:42:45 +0200,40
2019-03-14 14:48:37 +0200,40
2019-03-19 12:11:54 +0200,40
2019-04-02 13:31:43 +0300,39
2019-04-30 08:51:15 +0300,39
2019-05-08 16:37:01 +0300,39
2019-06-04 09:49:13 +0300,39
2019-06-28 14:51:38 +0300,41
2019-09-24 12:59:35 +0300,41
2019-09-30 11:21:54 +0300,41
2019-11-04 11:13:28 +0200,42
2019-11-15 10:35:17 +0200,42
2019-11-27 14:36:08 +0200,42
2020-03-12 14:56:51 +0200,43
2020-04-24 09:46:58 +0300,43
2020-05-05 12:26:34 +0300,43
2020-05-25 12:57:20 +0300,43
2020-06-04 11:19:41 +0300,43
2020-08-03 10:20:58 +0300,43
2021-02-15 11:22:22 +0200,50
2021-02-26 16:17:09 +0200,50
2021-03-10 09:31:10 +0200,50
2021-03-10 15:44:40 +0200,50
2021-05-21 08:21:48 +0300,50
2021-05-24 10:10:31 +0000,50
2021-05-25 12:34:20 +0000,50
2021-05-28 11:51:27 +0300,50
2021-08-17 10:38:43 +0300,50
2021-10-14 06:49:05 +0000,53
2021-10-14 06:51:33 +0000,53
2021-10-14 09:11:45 +0000,53
2021-10-19 12:33:18 +0300,53
2021-10-19 12:57:48 +0300,53
2021-10-27 13:34:38 +0300,53
2022-01-27 12:55:09 +0000,53
2022-01-28 08:03:25 +0000,53
2022-02-25 11:43:10 +0000,53
2022-02-28 10:38:44 +0000,53
2022-03-30 10:48:26 +0000,53
2022-04-21 05:45:41 +0000,53
2022-04-21 12:37:01 +0000,53
2022-05-10 11:34:18 +0000,53
2022-10-06 10:10:47 +0000,53
Per Lundberg
  • 3,837
  • 1
  • 36
  • 46