Access the changed files path in git pre-receive hook

Question

I'm writing a git pre-receive hook on the remote repo to make sure pushed code is consistent with our company's internal guidelines.

I'm able to find all the files that are to be checked when a pre-receive hook is fired, however, I don't have the path to these files to open them using normal file operations(e.g. cat git_working_directory/file_name would throw No such file or directory error). The application which validates the code requires the file path as an argument so that it can open the file and run its checks.

Consider this scenario: the developer created a new file and pushed it to the server and the pre-receive hook is triggered. At this stage the new file is not saved on to the remote's working directory because the pre-receive hook is still running.

I'm wondering if there is a temporary location where the files are saved in git as soon as they are pushed so that I can pass that directory to the application to run the checks?

Update:

I could checkout to a temporary location and run the checks there, this could be an option but considering the fact that developers frequently push, sometimes even at the same time and the repo being very big, this option doesn't seem to be feasible. I'm looking more for a solution where I can just use the path to the file if it is somehow available.

You would need to check the files out into a temporary working directory and run your checks there. — larsks, Feb 09 '15 at 16:39
@larsks is there a git command which I can run to check those file to temp directory? — aBadAssCowboy, Feb 09 '15 at 16:42
Just to clarify: Files you check in to repository are not kept there "just like that". Some of them are stored as deltas to others, or their the contents are compressed. There is no place where these files are guaranteed to exist in their "ready-to-consume" state — joozek, Feb 16 '15 at 09:39
Yes joozek, that is my understanding too. I was just wondering if there is something I missed about how git stores the files and how we can access them. Thanks. — aBadAssCowboy, Feb 16 '15 at 11:57

joozek · Answer 1 · 2015-02-16T13:47:39.953

6

I'm wondering if there is a temporary location where the files are saved in git as soon as they are pushed so that I can pass that directory to the application to run the checks?

No, there is no such place. Those files are stored as blobs and can be reduced to deltas and/or compressed, so there's no guarantee they'll be available anywhere in 'ready-to-consume' state.

The application which checks for the standards requires the file path as an argument so that it can open the file and run its checks.

If you're on linux you could just point /dev/stdin as input file and put the files through pipe.

#!/bin/sh
while read oldrev newrev refname; do
 git diff --name-only $oldrev $newrev | while read file; do
   git show $newrev:$file | validate /dev/stdin || exit 1
     done
done

edited Feb 16 '15 at 13:47

answered Feb 09 '15 at 18:41

joozek

2,143
2
21
32

Looks like a feasible option, I shall try and let you know. – aBadAssCowboy Feb 16 '15 at 08:52
Now that I think about it I realize this requires a working copy (which should not be present on server). I'll update my answer in a minute to pull directly from repo – joozek Feb 16 '15 at 09:22
@SriVishnuTotakura I edited my answer to remove the requirement for WC to exist. Check it out – joozek Feb 16 '15 at 09:33
Though it works, I think, I still would have a problem when multiple pushes occur at sometime. – aBadAssCowboy Feb 16 '15 at 11:59
@SriVishnuTotakura you shouldn't have any problems with that. git show outputs your file at desired revision (without any temp files) and the revision itself is immutable, so I can't imagine what could go wrong. Could please provide some more details about what you're worried about? – joozek Feb 16 '15 at 12:10
while the validation is running on one file which is written to /dev/stdin, another commit could be made by another user and the content of /dev/stdin would be overridden before it is fully read by the validation program I think. It won't be a case ? – aBadAssCowboy Feb 16 '15 at 12:14
@SriVishnuTotakura `/dev/stdin` is not a file, it's a [stream](http://en.wikipedia.org/wiki/Standard_streams). When `validate` process tries to open `/dev/stdin` what it really gets is output of what's left of pipe (`|`), in this case `git-show` – joozek Feb 16 '15 at 12:20
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/71014/discussion-between-sri-vishnu-totakura-and-joozek). – aBadAssCowboy Feb 16 '15 at 12:21
This will fail if either `oldrev` or `newrev` are all zeroes (which will happen on new branches or branch deletion) – Nepoxx Apr 08 '19 at 15:27

score 4 · Answer 2 · answered Sep 10 '19 at 04:54

To follow up on joozek's answer, if you need to check for when oldrev is all zeroes (you are pushing a new branch), you can still read through the new commits by adding the following to joozek's solution:

#!/bin/sh
z40=0000000000000000000000000000000000000000
while read oldrev newrev refname; do

 if [ $oldrev == $z40 ]; then
   # Commit being pushed is for a new branch
   oldrev=4b825dc642cb6eb9a060e54bf8d69288fbee4904
 fi
 git diff --name-only $oldrev $newrev | while read file; do
   git show $newrev:$file | validate /dev/stdin || exit 1
     done
done

The hash "4b825dc642cb6eb9a060e54bf8d69288fbee4904" is git's empty tree object, which is diff comparable (all zeroes is not). This way you will be able to check all objects being pushed to the new branch.

While this empty object hash is useful, be careful when using it. It can get computationally expensive if the commit/branch being pushed is large, since you are checking every object within.

Usually /bin/sh does not supports `==`, please use `=` instead — kvaps, Dec 09 '20 at 18:54

score 3 · Answer 3 · answered Feb 09 '15 at 23:40

The pushed files are saved in a permanent location, namely as git objects in the repository. One way to extract them is using git archive.

#! /usr/bin/perl -T

use strict;
use warnings;

# replace with your real check program and optional arguments
my @CHECK_PROGRAM = qw/ ls -la /;
#my @CHECK_PROGRAM = qw/ false /;

use File::Temp qw/ tempdir /;

$ENV{PATH} = "/bin:/usr/bin";

while (<>) {
  # <old-value> SP <new-value> SP <ref-name> LF
  my($oldsha,$newsha,$refname) = /\A([^ ]+) ([^ ]+) ([^ ]+)\x0A\z/;
  die "$0: unexpected input: $_" unless defined $refname;

  my $tmp = tempdir "prerecv-$$-XXXXXX", DIR => "/tmp", CLEANUP => 1;

  system(qq{git archive --format=tar $newsha | tar -C "$tmp" -x}) == 0
    or die "$0: git-archive $newsha failed";

  system(@CHECK_PROGRAM, $tmp) == 0 or die "$0: check failed";
}

Because the code runs on behalf of another user, it enables Perl’s taint mode security feature with the -T switch.

Note that your check program will see the entire tree as pushed without knowledge of which files have changed. If the checker requires information about the delta as well, investigate git diff-tree and possibly its --diff-filter option.

isn't `git archive` similar to making copies of the files? Would this be optimal considering that developers push code very frequently on to the server, sometimes even at the same time ? I'm looking for a location where they are already saved by git and which I can access. — aBadAssCowboy, Feb 16 '15 at 08:55
@SriVishnuTotakura You cannot count on any such directory location being associated with a remote repository, so the hook above copies the state of the directory tree associated with the commit to a temporary location so that your checker or validation application can do its job. Git updates are atomic, so the hook does not have to worry about concurrent pushes. If the directory tree being tracked is huge, say 1+ GiB, you may have a concern, but even then I’d suggested reorganizing your repository structure. — Greg Bacon, Feb 16 '15 at 14:18

score 1 · Answer 4 · answered Feb 09 '15 at 17:09

You would probably need to write pre-receive hook that checks out the files in a temporary location. Generally, that means you would git clone the bare repository into a temporary directory, and then check out the specific commit and run your checks. For example, something like:

#!/bin/sh

REPO=$PWD

check_files_in () {
  rev=$1

  # create a temporary working directory
  workdir=$(mktemp -d gitXXXXXX)
  (
    # arrange to clean up the workding directory
    # when the subshell exits
    trap "cd /; rm -rf $workdir" EXIT

    # unset GIT_DIR because it would confuse things
    unset GIT_DIR

    # clone the repository
    cd $workdir
    git clone $REPO check

    # checkout the specific revision we're checking
    cd check
    git checkout $rev

    # perform some sort of validation.  The exit code of this 
    # command will be the exit code of this function, so
    # returning an error will reject the push.
    run-validation-scripts
  )
}

while read oldrev newrev refname; do
  check_files_in $newrev || exit 1
done

Would this be optimal considering that developers push code very frequently on to the server, sometimes even at the same time ? I'm more like looking for a location where the files are already saved by git and which I can access. — aBadAssCowboy, Feb 16 '15 at 08:56

Access the changed files path in git pre-receive hook

4 Answers4

Linked