10

So I've been reading lately about how to setup a git server, and upon finding that there is no specific daemon needed at all (just an SSH server with a filesystem behind it), I started to look more on how git manages files under the hood.

The strategy of how each commit is represented inside the .objects folder and how everything fits together is quite clever, but it doesn't seem to be mentioned explicitly that this approach actually makes git achieve concurrency in a very simple way without the need of a signaling server.

Nonetheless, there are situations in which concurrency cannot be guaranteed, which is basically when history is re-written (forced pushes). In this case, is there any locking strategy used in the tree to avoid concurrency issues? Is there any more documentation on this topic out there?

(Something is said about this topic in this SO answer, but just very briefly.)

Community
  • 1
  • 1
knocte
  • 16,941
  • 11
  • 79
  • 125

2 Answers2

12

The git data structures are immutable, except refs (i.e. branches/tags/etc), and "rewriting history" is not very correct term, more appropriate "creating alternative history". The repo will have all objects - new and old. Moreover, all the changes created in a local repository, during "push" objects are just transferred. Then you push it, it sends all objects first (and because objects are defined by its content, they are unique, there is no concurrency problem). After all objects are sent, a reference is changing. It is just a tiny single file (refs/heads/<branchName>) to override with 40 bytes sha1 key. As I know it does atomic Compare-and-Set change of the file. It reads old ref value, creates a lock file, checks if old value is unchanged, replaces with new sha1 and deletes the lock. If it fails, the push fails and you need to retry (i.e. optimistic lock). You could figure out more details from source code, update_ref function.

After force-push some "loose objects" could appear (i.e. objects which are not referenced from any existing ref), so these objects are garbage collected later.

Very clever and neat.

kan
  • 28,279
  • 7
  • 71
  • 101
  • thanks! any reference about how the garbage-collection works? because I thought that was a local-working-copy operation, not an operation that the client could do on the server – knocte Nov 14 '13 at 13:26
4

Various files are created where necessary to acts as locks. Git creates a file called .git/index.lock to lock the index. git index-pack can create a .keep file to prevent a race condition. There may be more examples.

Robin Green
  • 32,079
  • 16
  • 104
  • 187
  • 1
    Worth to mention, that `index.lock` is for git index, so used only for commits, i.e. in a local repo only, usually by a single user. For pushing into a remote repo, there is no need to use index. `index-pack` is used usually during `git gc`. – kan Nov 14 '13 at 10:05