5

We want to run git filter-branch over a large codebase reformatting PHP files. Since we have over 21k commits phpcbf wants to format the whole codebase every commit of filter-branch. Is it possible to get just the files that have changed for each commit and format them specifically? Something like...

git filter-branch --tree-filter \
 'FILES=$(<something> | grep .php) php /usr/local/bin/phpcbf.phar $FILES || true'
Elliot Chance
  • 5,526
  • 10
  • 49
  • 80

1 Answers1

5

I found the solution:

git filter-branch --tree-filter 'phpcbf $(\
  git show $GIT_COMMIT --name-status | egrep ^[AM] |\
    grep .php | cut -f2)' -- --all

Just to give a quick overview of what it’s doing:

  • git show $GIT_COMMIT --name-status would return all the modified files for that commit.
  • egrep ^[AM] filters down the statues to Added and Modified only. No need to try and format files that are being Deleted.
  • grep .php to only format PHP files.
  • cut -f2 removes the status prefix from the list so we just get the raw file paths.

See my blog post for more details: https://elliot.land/post/reformatting-your-codebase-with-git-filter-branch

izstas
  • 5,004
  • 3
  • 42
  • 56
Elliot Chance
  • 5,526
  • 10
  • 49
  • 80
  • 1
    Elliot's solution got me 98% of the way there. It seems like some behavior in `git diff` during a filter-branch has changed (I'm running git 2.10.2). It was constantly comparing to my latest HEAD, rather than the filtered commit's parent. I was able to change Elliot's code slightly to make it work. Instead of `git diff --cached --name-status` during the filter, I used `git show $GIT_COMMIT --name-status`. `$GIT_COMMIT` is automatically set during each step of the `filter-branch`. – Andy Fowler Nov 06 '16 at 19:55
  • I found that `git show $GIT_COMMIT --name-only --pretty=oneline|tail -n +2` is more robust than `git show $GIT_COMMIT --name-status | egrep ^[AM]` Note that it will also return deleted files so you might need to test for the file existence. – Gabriel Devillers Dec 08 '21 at 10:32