Finding the branch where a commit originally appears (git/Jenkins/CD)

Question

I am designing a Jenkins build system that triggers when any tag is pushed to a repo. From there, we want to know what branch the commit referenced by the tag was pushed to. From there, I initiate other Jenkins builds based on that branch name. Everything in this pipeline is straightforward except finding out what branch was tagged.

Basically, my team made and are currently using production and staging branches - when a dev merges things into production or staging, and want to release, they will tag with a version number, and push it out. Jenkins can then update production servers with that tag on production branch, as well as staging with tags to the staging branch. If master is tagged, then I'll initiate a CI build and test.

I've been testing out the method from this blog post here: http://johndstein-blog.logdown.com/posts/428667 which offers the following:

export HASH=$(git rev-parse HEAD)
export BRANCH=$(basename $(git branch -r --contains ${HASH}))
export TAG=$(basename $(git describe --all --exact-match ${HASH}))

echo "HASH: $HASH"
echo "BRANCH: $BRANCH"
echo "TAG: $TAG"

but this doesn't work 100% of the time - for some repos, when running line 2 (grabbing branch) - I get multiple branches, and it errors out. I am fairly new to git, but as far as I can tell, this is because the commit was made in a branch and merged into another.

My question then is, can I reliably find the name of the branch a commit was originally pushed to if I have a tag? Moreover, is this a smart way of doing this?

score 2 · Accepted Answer · answered Nov 09 '16 at 03:33

This is not possible in general. "Was originally pushed to" is not even well-defined without picking some "original" and making it keep logs.

Here is an example. Suppose I create branches sneak and gotcha:

     C   <-- sneak
    /
A--B     <-- master
    \
     D   <-- gotcha

Now if I git push one or both of these branches, the receiving Git repository obtains the two commits C and D along with a request to update the names refs/heads/sneak and refs/heads/gotcha. So far, all seems good. But now I do this instead of pushing, or very rapidly after pushing—fast enough that you can't get in between to see what I am doing:

$ git push origin sneak:sneak gotcha:gotcha &&
> git checkout master &&
> git merge sneak gotcha &&
> git push origin master:master :sneak :gotcha

The git merge makes an octopus merge (which of course I've arranged to have succeed, otherwise this takes too long for me to fool you :-) ). The push step then sends commit E to the server, along with a request to update refs/heads/master to point to it, and to delete refs/heads/sneak and refs/heads/gotcha. The result is:

     C
    / \
A--B---E   <-- master
    \ /
     D

Which branches were C and D committed and/or pushed on? Well, we had that information on the server for about six milliseconds, before we overwrote and deleted it.

Worse, maybe the place I push is a push mirror, and the real server is further back. The push mirror may have had the information for as much as two or three seconds, plenty of time to grab it ... but the link between the push mirror and the real (end-point) server is acting up, and during those three seconds I overwrote it, so that the push mirror winds up sending commits C, D, and E to the real server, with one single request, to update refs/heads/master to point to commit E.

Now, if we define "originally pushed to" as "sent to the push mirror", and we make the push mirror keep a log, the log will show that I originally asked for commit C to go onto sneak and commit D to go onto gotcha. Assuming the link-down glitch between the push mirror and the final central server, that log is the only place with this information. You can arrange a side channel for retrieving this, but none of that is built in to Git (even the logging is problematic: you can try to use Git's reflogs but they may not be fine-grained enough, if you care about multiple pushes per second and truly strict ordering).

Reflogs are not enabled by default for bare repositories (and push mirrors), but you can enable them with a simple git config.

All that said ...

The main thing to worry about is the fact that commits can be on zero, one, or many branches.¹ The trick is to not depend on branch names unless you are the one controlling those names. You have brief moments of control over branch names in pre-receive and post-receive hooks, but it's tricky to use.

Your best bet is not to rely on the names at all, but rather to require some separate indicator, such as a string embedded in the commit message itself (and you can have a pre-receive hook that checks this). Or, you can simply require that your tag names have a well-defined format:

production-v1.0.1
staging-v3.7

or whatever. The tag's name tells you what the intent of that specific commit is, and is quite independent of any containing branches.

¹Commits that are on no branches are somewhat unusual, but easy to create: simply tag a tip-of-branch commit, then delete the branch name. You can push the tag and the commit goes to the receiving server with a tag, but no branch.

Absolutely wonderful explanation - not only did you answer my question, I think you taught me something valuable about git and helped me with my CD approach to boot. Thank you very much! — Min.E.On, Nov 10 '16 at 00:42

Finding the branch where a commit originally appears (git/Jenkins/CD)

1 Answers1

All that said ...