1

I am reading about how git stores the data/files. I am not able to understand what is the advantage of storing the whole content of a file instead of the differences.

What I understand is, if git stores the differences between each file it will take time for git to reconstruct the original file.

At the same time, storing the entire content instead of the differences will increase the repo size.

Am I right? Can anyone please explain me in detail?

Krishna Chaitanya
  • 2,533
  • 4
  • 40
  • 74
  • Have you read this: http://stackoverflow.com/a/8198276/126769 ? in particular the last paragraph. – nos Jan 04 '16 at 14:22
  • Yes I read that already. All paras are taken from git-scm documentation except the last one. I read the last para couple of times but did not understand it. – Krishna Chaitanya Jan 04 '16 at 14:29
  • Ok I think I understood that, it is obvious that git's approach will increase the repo size but once in a while git will compress all loose files using zlib and reduces the repo size. Am I right? And keeping the size of the repo aside, but I still don't understand what is the advantage of storing the entire content. – Krishna Chaitanya Jan 04 '16 at 14:34

1 Answers1

3

From one viewpoint, it doesn't matter.

Suppose you have two version control systems, Brand X and Brand Y (we've covered over the box with the actual brand name and hence made sure you can't see how the internal storage system works).

Then we give you one command for Brand X to view a specific version of a specific historical file, and another (with the same usage) for Brand Y, e.g.:

xshowme tag:path/to/file
yshowme tag:path/to/file

Both commands show you the same data. We extend this to xcheckout and ycheckout, and so on, but always keep to higher level commands, rather than ones that deliberately expose the internal workings.

If I tell you one brand uses deltas and the other uses content oriented store, can you tell me which one is which? If so, how? If not, why not?

(One approach to trying to distinguish them is to look at CPU usage, but this will be foiled if the content oriented version also uses deltas and if the delta-oriented version also uses content caches. As it turns out, the content-oriented version, i.e., git, also uses deltas—just, in an unusual way.)

torek
  • 448,244
  • 59
  • 642
  • 775