19

We use github to manage a great deal of our software environment, and I would wager that like many other orgs the overwhelming majority of traffic to/from that repo comes from our office. With that in mind, is there a way to build a local cache of a given github repository, but still have the protection of the cloud version? I'm thinking of this in the model of a caching proxy server, where the local server (presumably in our building, on our local network) would handle the vast majority of cloning/pull operations.

This seems like it should be doable, but searching for this has been very difficult, I think in no small part because the words "local" and "cache" have overloaded meanings especially for git(hub) questions.

Lætitia
  • 1,388
  • 1
  • 20
  • 30
ljwobker
  • 832
  • 2
  • 10
  • 20
  • 2
    *Every* repository *is* a 'local cache'. – user2864740 Aug 25 '15 at 01:09
  • Can't you just backup your local `.git` repo file? – Tim Biegeleisen Aug 25 '15 at 01:10
  • 1
    This is a performance optimization, not a backup. I want regular users to do a "git pull" or "git clone" or whatever, and instead of having that request travel to github.com, I want it to go to some local server that has a cached copy of the repo... making the process presumably much faster. We do a number of full clones every day in automated testing, and having a copy of the repo that lives on a disk geographically close to the users would make things a LOT faster. – ljwobker Aug 25 '15 at 01:24

3 Answers3

13

You should check out the git-cache-http-server project. I think it partly implements what you need (and is similar to the idea from @larsks post).

It is a NodeJS piece of software that runs an HTTP server to provide you access to locally cached git repositories. The server automatically does fetch upstream changes when required. If you use those local git repositories instead of the distant ones, your git client will be served locally cached content.

If you run the git-cache-http-server on a separate host (VM or container for example), you can configure your local git client to automatically clone and fetch from the cache by configuring it to replace https://github.com with something like http://gitcache/github.com. This can be achieved by a configuration like:

git config --global url."http://gitcache:1234/".insteadOf https://

At the moment, this software only provides a cache to clone and update a repository, there is no provision for pushing changes back. For some use cases, thinking about a CI infrastructure that needs to pull content of multiple repositories even when only a single one has changed or the automated testing you mention, this can be useful.

Lætitia
  • 1,388
  • 1
  • 20
  • 30
  • I like it. It can save a lot of time when I use docker to build. please install master branch version since it support http_proxy via `npm install -g git+https://git@github.com/jonasmalacofilho/git-cache-http-server.git` – Tom Shen Jan 29 '19 at 09:46
  • https://randyfay.com/content/reference-cache-repositories-speed-clones-git-clone-reference or this which is built in. ^ this answers is generally a rough solution – Richard Tyler Miles Sep 07 '22 at 17:10
  • [Official Documentation on Git Referencing](https://git-scm.com/docs/git-clone#git-clone---reference-if-ableltrepositorygt) – Richard Tyler Miles Sep 07 '22 at 17:13
10

Your latest comment makes it clear you're looking for a performance optimization. That helps.

You can start by creating a local mirror of the github repository following these instructions. You can either periodically update it, or arrange to receive web hooks from github to update the local mirror "on demand". To do this you would need to set up a small web service that would respond to the hooks from github. You can add a web hook by going to https://github.com/someuser/someproject/settings/hooks/new. You will probably want to select the "Let me select individual events" radio button, and then select:

  • delete
  • push
  • create

This would keep your cache up-to-date with respect to changes in available tags and branches.

Set up a git server that makes that repository available locally. This can be as simple as running git daemon, or a local account accessible via ssh, or something more full featured, depending on your local requirements.

Then you would set up your local working copies like this:

$ git clone http://localrepository/someproject.git
$ cd someproject
$ git remote set-url --push http://github.com/someuser/someproject.git

This would set up each repository to pull from your local cache, but push changes upstream to github.

larsks
  • 277,717
  • 41
  • 399
  • 399
3

Look at git clone --reference-if-able to take objects from another (in your case on-site) repository.

Tom Hale
  • 40,825
  • 36
  • 187
  • 242