3

Given a single branch in a git repository, how do I grab all versions of a file? The final desired output is one new output file per version (although printing all of the versions to STDOUT is fine as well, as long as it's easy to determine the extent of each version).

Here's what I'm currently using:

branch=master
filename=...whatever...

myids=( $(git rev-list --objects $branch -- $filename | grep $filename | perl -e 'for (<>) { print substr($_, 0, 40) . "\n"; }') )

for (( i=0,j=1; i < ${#myids[@]}; i++,j++ )) 
    do
        echo $i $j
        name="output_$j.txt"
        git cat-file -p ${myids[$i]} > $name
    done

Explanation of what I'm doing:

  • use rev-list to find relevant hashes
  • strip the hashes of commits and trees, leaving just the hashes of the files (these lines also include the filename)
  • strip the filenames
  • run the hashes through cat-file and generate output files

The two main problems I have with this are that 1) it's not robust, because the grep could have false positives, and I'm not sure if it would follow file renames, and 2) it feels super hacky.

Is there a better way to do this, perhaps making better use of git's API?

fpietka
  • 1,027
  • 1
  • 10
  • 23
Matt Fenwick
  • 48,199
  • 22
  • 128
  • 192

2 Answers2

2

Not the best solution, but it's one possibility

If this is just a one-off thing that you're trying to do, you could try this with the following script, but it uses Git porcelain from version 1.9.4 though, so it's definitely not a robust, reliable solution, since it's dependent on what version of Git you're using:

#!/bin/bash
mkdir temp
filepath=osx/.gitconfig

for sha in $(git log --format="%H" $filepath); do
  git show $sha:$filepath > temp/$sha.file
done

It simply uses git log to find all commits that modified the file:

$ git log --format="%H" osx/.gitconfig
338243aa6b68edad1dc3b2eebf66e108e9a4d685
7a4667138a519691386940ac23f9c8271ce14c77
475593a612141506f59a141e38b8c6a3a2917f85
03fa0711032cfdfc37fb431d60567ef22d75c7e5
3f7d8f0fc7e1d7a614f2aef8f53947ec2ce61296
c5fef8fccef3fc13f9dea17db209f2ceaab70002
287dadd8bcaf7e9197c6a16d57d3bacb72a41812
1f34ee1ab6965635a8f412bf3387f9dfdf197a1d

Then uses the <revision>:<filepath> syntax to output the version of the file from that revision.

git log can sometimes simplify your graph history though, so you might even want to pass the --full-history flag to git log, though I'm not exactly sure if it would be necessary for this particular use case.

How well would this follow renamed files though? You'd probably need to make the script a little smarter about keeping track of that, in order to use the right file path.

Again, however, I'd like to emphasize that this is not the best solution, a better solution would make use of Git plumbing commands instead, since they won't be so dependent on the Git version, and will be more backward and forward compatible.

Documentation

  • Nice! I bet adding `--follow` would get it to follow renames. – Matt Fenwick Aug 02 '14 at 03:09
  • @MattFenwick ok...just keep in mind that this is probably *not* the best solution. There should be a better plumbing way to do this. –  Aug 02 '14 at 03:14
1

I would suggest using git-log on the file itself, instead of git-rev-list:

#!/bin/bash

filename=...whatever...

i=1
for hash in `git log --pretty=format:%H -- $filename`
do
    git show $hash:$filename > output_$i.txt
    ((i++))
done
fpietka
  • 1,027
  • 1
  • 10
  • 23
  • So you wouldn't know this because you can't see deleted answers, but [this was basically my answer](http://stackoverflow.com/a/25087993/456814), until I decided that I wasn't sure if it would follow renames, like what the original poster asked for. Also in my answer, I pointed out that it uses Git porcelain instead of plumbing, so it's extremely brittle and dependent on what Git version that you're using, so it's not the best solution, though it's definitely one (sub-optimal) alternative. –  Aug 01 '14 at 22:45